Closed colearendt closed 3 years ago
I would be curious if you have any specific examples of issues you've experienced on RHEL 7 (especially recently). We run CentOS 7 (without any additional devtools installation [1]) as part of our CI process on every commit/PR in Arrow and expect that installing Arrow should just work [2]. If it doesn't, we would very very much love to hear about it and help resolve that (feel free to open a jira, or I'm happy to have logs emailed to me if there's need for privacy).
Here's a recent run of that on our CI: https://github.com/apache/arrow/runs/3997101335?check_suite_focus=true GHA action
[1] – We also test with devtools installed to make sure that upgrading gcc also works. Here's a recent build of that nightly CI job: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=14451&view=results [2] – There are two features which are turned off on these builds: S3 support and the mimalloc memory allocator. Both of those require gcc >= 4.9.
Ok, I'm going to leave this as is then. @colearendt please let me know if you encounter any customers that this affects.
IIRC this was a customer that generated this issue. I can find out who if that would be helpful. They were having trouble with pins
, so we suggested moving forwards to the new version, and then installing new pins
failed on the arrow
package.
Apologies for missing the message @jonkeane - I'll see if I can dig up any of this old info
Updated slack thread with info from the customer. They installed devtoolset-10
and then arrow 3.0.0
and that resolved their issue. I wasn't able to install arrow into the default rstudio/r-base:4.0.3-centos7
image, but that may be something else. I'm going to move on to other things, but just wanted to record this info in case I end up back here.
[ 61%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_21_cxx.cxx.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make[2]: *** [src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/Unity/unity_1_cxx.cxx.o] Error 4
make[1]: *** [src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 63%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_20_cxx.cxx.o
In file included from /tmp/RtmpXxLJZh/file5603479b4169/src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_20_cxx.cxx:8:0:
/tmp/RtmpvsMnw8/R.INSTALL55d9b9ee75b/arrow/tools/cpp/src/arrow/filesystem/mockfs.cc:264:23: warning: ‘arrow::fs::internal::MockFileSystem::Impl’ has a field ‘arrow::fs::internal::MockFileSystem::Impl::root’ whose type uses the anonymous namespace [enabled by default]
class MockFileSystem::Impl {
^
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make[2]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_21_cxx.cxx.o] Error 4
make[2]: *** Waiting for unfinished jobs....
cc1plus: warning: unrecognized command line option "-Wno-subobject-linkage" [enabled by default]
make[1]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/all] Error 2
gmake: *** [all] Error 2
**** Error building Arrow C++.
------------------------- NOTE ---------------------------
There was an issue preparing the Arrow C++ libraries.
See https://arrow.apache.org/docs/r/articles/install.html
---------------------------------------------------------
ERROR: configuration failed for package ‘arrow’
* removing ‘/opt/R/4.0.3/lib/R/library/arrow’
Aaah, we've seen this before, g++: internal compiler error: Killed (program cc1plus)
is indicating the compilation process was killed — there are other possibilities, but almost every time I've seen it is because of running in a memory constrained environment and OOMing during the build. How much memory did you have available where this happened?
There are a bunch of factors that contribute to how much RAM is needed during compilation (which features are being compiled, which dependencies need to be compiled, how much parallelism is enabled). However, we've found the biggest culprit of memory in our build process was building with unity enabled (which is exactly where this failed), so we've disabled that by default (starting with the 7.0.0 release) to reduce that chances of people running into this. Unity is supposed to speed up compilation at the expense of requiring more RAM, but in cases like docker containers with relatively limited memory, that ends up in situations like this. We are also working on adding ram requirements to our docs: https://github.com/apache/arrow/pull/11205 to make this a bit clearer. Hopefully these two together will resolve most cases of this, and when it doesn't there's clearer guidance of minimums.
I'm seeing very long build times (60mins+) when trying to install arrow for the pins library within a Docker container via install.packages() on my CI, Cloud Buiild. Is there any advice on how to speed it up aside booting a bigger machine, perhaps using a different FROM to build within?
I've looked at the apache-dev/arrow images but it seems some work to install R, tidyverse etc on top. I'm looking for an image that has arrow/pins ideally.
The quickest and easiest way to install Arrow is to do one of the following:
NOT_CRAN=TRUE
before installing. This sets up the installation process to download a binary of Arrow as part of the install process which will be much much quicker.There's more information in our documentation. I've linked to our nightly docs because we've recently improved them and they are quite a bit clearer about this — we're in the process of releasing right now so those will be on the main page soon enough!
There are a (relatively broad!) set of OSes that RSPM/we support — if you're finding that the image you're using isn't supported, please let us know and we'll see what we can do.
One final note about the apache-dev/arrow
images, yeah, those are mostly for our CI process and aren't really designed to be used downstream necessarily. As far as I know, that's not a principled decisions, and the Arrow community might be interested in extending those to be (more) useable like this. If that's something you're interested in (especially helping us out do that!) I would recommend either opening an issue or sending a message to the dev mailing list for discussing this.
Great thanks will take a look. For some reason the RStudio binaries weren't being used even though I remember since R 4.0 rocker images did default to that, but will look at those options. In general having an R arrow Docker available will be nice to have.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
RHEL 7 systems (which are used often by our customers) do not have a new enough
gcc
to compilec++11
code. This creates a problem with installing thearrow
package, and workarounds often (1) involve IT or (2) are disallowed by policy.As a result, making the
arrow
dependency optional would be beneficial for customers.