rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
316 stars 62 forks source link

Make `arrow` dependency optional #537

Closed colearendt closed 3 years ago

colearendt commented 3 years ago

RHEL 7 systems (which are used often by our customers) do not have a new enough gcc to compile c++11 code. This creates a problem with installing the arrow package, and workarounds often (1) involve IT or (2) are disallowed by policy.

As a result, making the arrow dependency optional would be beneficial for customers.

jonkeane commented 3 years ago

I would be curious if you have any specific examples of issues you've experienced on RHEL 7 (especially recently). We run CentOS 7 (without any additional devtools installation [1]) as part of our CI process on every commit/PR in Arrow and expect that installing Arrow should just work [2]. If it doesn't, we would very very much love to hear about it and help resolve that (feel free to open a jira, or I'm happy to have logs emailed to me if there's need for privacy).

Here's a recent run of that on our CI: https://github.com/apache/arrow/runs/3997101335?check_suite_focus=true GHA action

[1] – We also test with devtools installed to make sure that upgrading gcc also works. Here's a recent build of that nightly CI job: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=14451&view=results [2] – There are two features which are turned off on these builds: S3 support and the mimalloc memory allocator. Both of those require gcc >= 4.9.

hadley commented 3 years ago

Ok, I'm going to leave this as is then. @colearendt please let me know if you encounter any customers that this affects.

colearendt commented 3 years ago

IIRC this was a customer that generated this issue. I can find out who if that would be helpful. They were having trouble with pins, so we suggested moving forwards to the new version, and then installing new pins failed on the arrow package.

Apologies for missing the message @jonkeane - I'll see if I can dig up any of this old info

colearendt commented 2 years ago

Updated slack thread with info from the customer. They installed devtoolset-10 and then arrow 3.0.0 and that resolved their issue. I wasn't able to install arrow into the default rstudio/r-base:4.0.3-centos7 image, but that may be something else. I'm going to move on to other things, but just wanted to record this info in case I end up back here.

[ 61%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_21_cxx.cxx.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make[2]: *** [src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/Unity/unity_1_cxx.cxx.o] Error 4
make[1]: *** [src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 63%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_20_cxx.cxx.o
In file included from /tmp/RtmpXxLJZh/file5603479b4169/src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_20_cxx.cxx:8:0:
/tmp/RtmpvsMnw8/R.INSTALL55d9b9ee75b/arrow/tools/cpp/src/arrow/filesystem/mockfs.cc:264:23: warning: ‘arrow::fs::internal::MockFileSystem::Impl’ has a field ‘arrow::fs::internal::MockFileSystem::Impl::root’ whose type uses the anonymous namespace [enabled by default]
 class MockFileSystem::Impl {
                       ^
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make[2]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_21_cxx.cxx.o] Error 4
make[2]: *** Waiting for unfinished jobs....
cc1plus: warning: unrecognized command line option "-Wno-subobject-linkage" [enabled by default]
make[1]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/all] Error 2
gmake: *** [all] Error 2
**** Error building Arrow C++.
------------------------- NOTE ---------------------------
There was an issue preparing the Arrow C++ libraries.
See https://arrow.apache.org/docs/r/articles/install.html
---------------------------------------------------------
ERROR: configuration failed for package ‘arrow’
* removing ‘/opt/R/4.0.3/lib/R/library/arrow’
jonkeane commented 2 years ago

Aaah, we've seen this before, g++: internal compiler error: Killed (program cc1plus) is indicating the compilation process was killed — there are other possibilities, but almost every time I've seen it is because of running in a memory constrained environment and OOMing during the build. How much memory did you have available where this happened?

There are a bunch of factors that contribute to how much RAM is needed during compilation (which features are being compiled, which dependencies need to be compiled, how much parallelism is enabled). However, we've found the biggest culprit of memory in our build process was building with unity enabled (which is exactly where this failed), so we've disabled that by default (starting with the 7.0.0 release) to reduce that chances of people running into this. Unity is supposed to speed up compilation at the expense of requiring more RAM, but in cases like docker containers with relatively limited memory, that ends up in situations like this. We are also working on adding ram requirements to our docs: https://github.com/apache/arrow/pull/11205 to make this a bit clearer. Hopefully these two together will resolve most cases of this, and when it doesn't there's clearer guidance of minimums.

MarkEdmondson1234 commented 2 years ago

I'm seeing very long build times (60mins+) when trying to install arrow for the pins library within a Docker container via install.packages() on my CI, Cloud Buiild. Is there any advice on how to speed it up aside booting a bigger machine, perhaps using a different FROM to build within?

I've looked at the apache-dev/arrow images but it seems some work to install R, tidyverse etc on top. I'm looking for an image that has arrow/pins ideally.

jonkeane commented 2 years ago

The quickest and easiest way to install Arrow is to do one of the following:

There's more information in our documentation. I've linked to our nightly docs because we've recently improved them and they are quite a bit clearer about this — we're in the process of releasing right now so those will be on the main page soon enough!

There are a (relatively broad!) set of OSes that RSPM/we support — if you're finding that the image you're using isn't supported, please let us know and we'll see what we can do.

One final note about the apache-dev/arrow images, yeah, those are mostly for our CI process and aren't really designed to be used downstream necessarily. As far as I know, that's not a principled decisions, and the Arrow community might be interested in extending those to be (more) useable like this. If that's something you're interested in (especially helping us out do that!) I would recommend either opening an issue or sending a message to the dev mailing list for discussing this.

MarkEdmondson1234 commented 2 years ago

Great thanks will take a look. For some reason the RStudio binaries weren't being used even though I remember since R 4.0 rocker images did default to that, but will look at those options. In general having an R arrow Docker available will be nice to have.

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.