Open RobertCsordas opened 5 years ago
Hey @xdever,
Going forward, this will need some way to build pyarrow wheels for ppc64. The official way to build pyarrow wheels is through crossbow (see https://github.com/apache/arrow/tree/master/dev/tasks), we reuse most of the these scripts to build the pyarrow wheels, see https://github.com/pcmoritz/arrow-build/blob/master/.travis.yml#L54.
This infrastructure uses travis, so it won't work out of the box, but it is easy to run the scripts on a dedicated machine. If you follow the instructions https://github.com/apache/arrow/tree/master/python/manylinux1 on your ppc machine, it shouldn't be too hard to build the wheels (everything is dockerized). Once you have the wheels, you can replace the pip install in https://github.com/ray-project/ray/blob/dec7c3f8f5cfdd74b6f824307ec60d9e55232a60/build.sh#L121 with a pip install that installs your power pc wheels and you are good to go.
Let me know if you have questions about this or run into trouble!
Best wishes, Philipp.
Hi @pcmoritz,
Wouldn't it make sene to for you to provide pyarrow cross compiled to PPC64? Probably I'm not the only one who whats to use ray on IBM Minsky, which is a PPC64LE. The build process is nontrivial, and I'm afraid it would prevent many people from using it. If you don't want to bother with the PPC binaries, and don't want to keep the cmake build script in the build.sh, it would be good if it could be moved to a different script or different repository, in order for people still able to build it without too much effort.
Thank you, Robert
Hi all,
Any progress on this?
Building PyArrow is the most difficult thing I have ever seen on Linux... At least could you somehow provide the old script that was used to auto-build it?
Thank you, Robert
+1
I am working with 2x IBM AC922 systems, and cannot build Ray from source on them.
I managed to get Ray 0.7.5 working fine (even with the rest of the cluster which is x86). For a newer version, you should change the versions and commit numbers in the script. It was super difficult and took me a few days to make it work, so I made a script out of it to be able to reproduce it next time.
#!/bin/bash
mkdir ~/ray_build
cd ~/ray_build
mkdir bazel_build
cd bazel_build
wget https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-dist.zip
unzip bazel*
env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh
cd output
export PATH=`pwd`:$PATH
cd ../../
git clone --recursive https://github.com/apache/arrow
cd arrow
git checkout 141a213a54f4979ab0b94b94928739359a2ee9ad
#git checkout tags/apache-arrow-0.14.0
git submodule update --recursive
mkdir build
cd build
cmake ../cpp -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_INSTALL_PREFIX=~/ray_build/arrow -DCMAKE_C_FLAGS=-O3 -DCMAKE_CXX_FLAGS=-O3 -DARROW_BUILD_TESTS=off -DARROW_HDFS=on -DARROW_BOOST_USE_SHARED=off -DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python3 -DARROW_PYTHON=on -DARROW_PLASMA=on -DARROW_TENSORFLOW=off -DARROW_JEMALLOC=off -DARROW_WITH_BROTLI=off -DARROW_WITH_LZ4=on -DARROW_WITH_ZSTD=off -DARROW_WITH_THRIFT=ON -DARROW_PARQUET=ON -DARROW_WITH_ZLIB=ON
make -j`nproc`
make install
cd ../python
export PKG_CONFIG_PATH=~/ray_build/arrow/lib/pkgconfig:$PKG_CONFIG_PATH
export PYARROW_BUILD_TYPE='release'
export PYARROW_WITH_ORC=0
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_PLASMA=1
export PYARROW_BUNDLE_ARROW_CPP=1
#export PYARROW_BUNDLE_BOOST=1
#export PYARROW_BOOST_NAMESPACE=arrow_boost
pip3 install -r requirements-wheel.txt --user
SETUPTOOLS_SCM_PRETEND_VERSION="0.14.0.RAY" python3 setup.py build_ext --inplace
SETUPTOOLS_SCM_PRETEND_VERSION="0.14.0.RAY" python3 setup.py bdist_wheel
cp dist/pyarrow*.whl ~/ray_build
cd ../../
git clone --recursive https://github.com/ray-project/ray
cd ray
git checkout tags/ray-0.7.5
git submodule update --recursive
export SKIP_PYARROW_INSTALL=1
cd python
python3 -m pip install -q --target ray/pyarrow_files ~/ray_build/pyarrow*.whl --system
python3 setup.py bdist_wheel
@xdever Thanks for the detailed steps! I tried to build ray-0.7.5 with the above steps. But the ray build fails with the following error:
+ /root/ray_build/bazel_build/output/bazel build //:ray_pkg --verbose_failures
INFO: Call stack for the definition of repository 'com_github_jupp0r_prometheus_cpp' which is a http_archive (rule definition at /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/bazel_tools/tools/build_defs/repo/http.bzl:229:16):
- /root/ray_build/ray/bazel/ray_deps_setup.bzl:96:5
- /root/ray_build/ray/WORKSPACE:5:1
ERROR: An error occurred during the fetch of repository 'com_github_jupp0r_prometheus_cpp':
java.io.IOException: Error downloading [https://github.com/jovany-wang/prometheus-cpp/archive/master.zip] to /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/com_github_jupp0r_prometheus_cpp/master.zip: GET returned 404 Not Found
INFO: Call stack for the definition of repository 'build_stack_rules_proto' which is a http_archive (rule definition at /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/bazel_tools/tools/build_defs/repo/http.bzl:229:16):
- /root/ray_build/ray/bazel/ray_deps_setup.bzl:113:5
- /root/ray_build/ray/WORKSPACE:5:1
ERROR: error loading package '': in /root/ray_build/ray/bazel/ray_deps_build_all.bzl: Encountered error while reading extension file 'repositories.bzl': no such package '@com_github_jupp0r_prometheus_cpp//': java.io.IOException: Error downloading [https://github.com/jovany-wang/prometheus-cpp/archive/master.zip] to /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/com_github_jupp0r_prometheus_cpp/master.zip: GET returned 404 Not Found
ERROR: error loading package '': in /root/ray_build/ray/bazel/ray_deps_build_all.bzl: Encountered error while reading extension file 'repositories.bzl': no such package '@com_github_jupp0r_prometheus_cpp//': java.io.IOException: Error downloading [https://github.com/jovany-wang/prometheus-cpp/archive/master.zip] to /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/com_github_jupp0r_prometheus_cpp/master.zip: GET returned 404 Not Found
INFO: Elapsed time: 11.948s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
Do these steps still work for you? Or did you have to make some changes to the script recently?
Also, I did some searches for that error and found this thread: https://github.com/ray-project/ray/issues/6373. So, I tried building ray-0.8.0 instead. That did not complain about the "prometheus-cpp" download failure. But it gave the following error:
+ /root/ray_build/bazel_build/output/bazel build //:ray_pkg --verbose_failures
INFO: Options provided by the client:
Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /root/ray_build/ray/.bazelrc:
'build' options: --compilation_mode=opt --action_env=BAZEL_LLVM --action_env=BAZEL_SH --action_env=PATH --action_env=PYTHON2_BIN_PATH --action_env=PYTHON3_BIN_PATH --action_env=USE_CLANG_CL=1 --enable_platform_specific_config --per_file_copt=-\.(asm|S)$,-.*/arrow/util/logging\.cc@-Werror --per_file_copt=-\.(asm|S)$,\.pb\.cc$@-w --per_file_copt=-\.(asm|S)$,external/.*@-w --host_copt=-Wno-builtin-macro-redefined --host_copt=-Wno-inconsistent-missing-override --host_copt=-Wno-microsoft-unqualified-friend --per_file_copt=-\.(asm|S)$,external/com_github_grpc_grpc/.*@-DGRPC_BAZEL_BUILD --http_timeout_scaling=5.0 --incompatible_depset_is_not_iterable=false
ERROR: Unrecognized option: --enable_platform_specific_config
Do you think maybe this issue could be due to an old version of bazel? Should I try to build a newer version of bazel?
Same problem here, seems that https://github.com/jovany-wang/prometheus-cpp no longer exists.
@amitsadaphule
The error you are seeing when trying to build ray-0.8.0
is indeed due to Bazel v0.26.1
being too old. The --enable_platform_specific_config
option wasn't added to Bazel until v1.0.0, specifically in this commit: https://github.com/bazelbuild/bazel/commit/59755455034a998cdedfb7b086aea3ad78419381
I edited @xdever's script again to bump the version, but I get a new error:
...
+ popd
~/ray_build/ray/build ~/ray_build/ray/python
+ export PYTHON3_BIN_PATH=/home/kfelker/.conda/envs/frnn/bin/python3
+ PYTHON3_BIN_PATH=/home/kfelker/.conda/envs/frnn/bin/python3
+ export PYTHON2_BIN_PATH=/home/kfelker/.conda/envs/frnn/bin/python3
+ PYTHON2_BIN_PATH=/home/kfelker/.conda/envs/frnn/bin/python3
+ '[' NO == YES ']'
+ '[' YES == YES ']'
+ /home/kfelker/bin/bazel build //:ray_pkg --verbose_failures
/home/kfelker/bin/bazel: line 89: /home/kfelker/.bazel/bin/bazel-real: cannot execute binary file
/home/kfelker/bin/bazel: line 89: /home/kfelker/.bazel/bin/bazel-real: Success
Traceback (most recent call last):
File "setup.py", line 210, in <module>
license="Apache 2.0")
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/site-packages/setuptools/__init__.py", line 145, in setup
return distutils.core.setup(**attrs)
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 192, in run
self.run_command('build')
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "setup.py", line 99, in run
subprocess.check_call(command)
File "/home/kfelker/.conda/envs/frnn/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../build.sh', '-p', '/home/kfelker/.conda/envs/frnn/bin/python3']' returned non-zero exit status 1.
@felker I tried installing bazel from yum which installed bazel 1.2.1 in ubi 7.6 along with java-11-openjdk-11.0.6.10-1.el7_7 and setting JAVA_HOME with the following commands:
wget -O /etc/yum.repos.d/vbatts-bazel-epel-7.repo https://copr.fedorainfracloud.org/coprs/vbatts/bazel/repo/epel-7/vbatts-bazel-epel-7.repo
yum install -y bazel
export JAVA_HOME=$(compgen -G '/usr/lib/jvm/java-11-openjdk-*')
That made the build proceed a bit further and then fail with the following error:
Starting local Bazel server and connecting to it...
ERROR: /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/boost/BUILD.bazel:116:1: Configurable attribute "srcs" doesn't match this configuration (would a default condition help?).
Conditions checked:
@boost//:linux_arm
@boost//:linux_x86_64
@boost//:osx_x86_64
@boost//:windows_x86_64
INFO: Call stack for the definition of repository 'boringssl' which is a http_archive (rule definition at /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/bazel_tools/tools/build_defs/repo/http.bzl:292:16):
- /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/com_github_grpc_grpc/bazel/grpc_deps.bzl:100:9
- /root/ray_build/ray/bazel/ray_deps_build_all.bzl:16:3
- /root/ray_build/ray/WORKSPACE:9:1
ERROR: Analysis of target '//:ray_pkg' failed; build aborted:
/root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/boost/BUILD.bazel:116:1: Configurable attribute "srcs" doesn't match this configuration (would a default condition help?).
Conditions checked:
@boost//:linux_arm
@boost//:linux_x86_64
@boost//:osx_x86_64
@boost//:windows_x86_64
INFO: Elapsed time: 240.991s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (57 packages loaded, 5577 targets configured)
I'll see if I can resolve this error.
I also was able to bootstrap Bazel v1.1.0 on ppc64le by following the instructions here: https://openpower.ic.unicamp.br/post/installing-bazel-power-other-architectures-systems/
and noticing that user clnperez fixed the Bazel build process for Power only by that version: https://github.com/bazelbuild/bazel/pull/9346 via two commits: https://github.com/bazelbuild/bazel/commit/5cff4f1edf8b95bf0612791632255852332f72b5 https://github.com/bazelbuild/bazel/commit/27612bb1f6131cd86b42306c80037946b686c9c7
After a few more hiccups (using an old CMake < v3.x, not having Boost installed for Arrow), I have also gotten as far as you but am stuck again.
@pcmoritz Travis CI now supports ppc64le
jobs within LXD containers for open source projects:
https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z
Would it be easy for you to add it to this to your current build matrix? I am trying to set it up on a forked version of https://github.com/ray-project/arrow-build but it is challenging
@felker Have you had any luck with the ray build? I tried with different bazel versions. With 1.2.1 and 1.0.0, I have the same observations. Not sure whether this is an issue with bazel or the build environment.
I had misinterpreted the errors seen earlier. They were in boost's .bazel file. The issue was not with boringssl. The following changes in /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/boost/BUILD.bazel made the build proceed further:
On line 1603, add the following in "defines = select({": ":linux": [],
On line 99 in BOOST_CTX_ASM_SOURCES, add the following: ":linux": [ "libs/context/src/asm/jump_ppc64_sysv_elf_gas.S", "libs/context/src/asm/make_ppc64_sysv_elf_gas.S", "libs/context/src/asm/ontop_ppc64_sysv_elf_gas.S", ],
Now, there is a compilation error in building plasma as shown below:
ERROR: /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/plasma/BUILD.bazel:70:1: C++ compilation of rule '@plasma//:plasma_client' failed (Exit 1) gcc failed: error executing command
(cd /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/sandbox/processwrapper-sandbox/1699/execroot/com_github_ray_project_ray && \
exec env - \
LD_LIBRARY_PATH=/opt/rh/rh-python36/root/usr/lib64 \
PATH=/root/ray_build/bazel_build/output:/opt/rh/rh-python36/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/go/bin \
PWD=/proc/self/cwd \
PYTHON2_BIN_PATH=/opt/rh/rh-python36/root/usr/bin/python3 \
PYTHON3_BIN_PATH=/opt/rh/rh-python36/root/usr/bin/python3 \
USE_CLANG_CL=1 \
/usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/ppc-opt/bin/external/plasma/_objs/plasma_client/client.pic.d '-frandom-seed=bazel-out/ppc-opt/bin/external/plasma/_objs/plasma_client/client.pic.o' -fPIC -DBOOST_FALLTHROUGH -iquote external/plasma -iquote bazel-out/ppc-opt/bin/external/plasma -iquote external/boost -iquote bazel-out/ppc-opt/bin/external/boost -iquote external/com_github_google_glog -iquote bazel-out/ppc-opt/bin/external/com_github_google_glog -iquote external/com_github_gflags_gflags -iquote bazel-out/ppc-opt/bin/external/com_github_gflags_gflags -iquote external/com_github_google_flatbuffers -iquote bazel-out/ppc-opt/bin/external/com_github_google_flatbuffers -Ibazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client -Ibazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow -Ibazel-out/ppc-opt/bin/external/com_github_google_glog/_virtual_includes/default_glog_headers -Ibazel-out/ppc-opt/bin/external/com_github_gflags_gflags/_virtual_includes/gflags -isystem external/boost -isystem bazel-out/ppc-opt/bin/external/boost -isystem external/boost/boost/filesystem -isystem bazel-out/ppc-opt/bin/external/boost/boost/filesystem -isystem external/boost/boost/config -isystem bazel-out/ppc-opt/bin/external/boost/boost/config -isystem external/boost/boost/version -isystem bazel-out/ppc-opt/bin/external/boost/boost/version -isystem external/boost/boost/functional -isystem bazel-out/ppc-opt/bin/external/boost/boost/functional -isystem external/boost/boost/container_hash -isystem bazel-out/ppc-opt/bin/external/boost/boost/container_hash -isystem external/boost/boost/assert -isystem bazel-out/ppc-opt/bin/external/boost/boost/assert -isystem external/boost/boost/core -isystem bazel-out/ppc-opt/bin/external/boost/boost/core -isystem external/boost/boost/integer -isystem bazel-out/ppc-opt/bin/external/boost/boost/integer -isystem external/boost/boost/static_assert -isystem bazel-out/ppc-opt/bin/external/boost/boost/static_assert -isystem external/boost/boost/limits -isystem bazel-out/ppc-opt/bin/external/boost/boost/limits -isystem external/boost/boost/type_traits -isystem bazel-out/ppc-opt/bin/external/boost/boost/type_traits -isystem external/boost/boost/mpl -isystem bazel-out/ppc-opt/bin/external/boost/boost/mpl -isystem external/boost/boost/move -isystem bazel-out/ppc-opt/bin/external/boost/boost/move -isystem external/boost/boost/detail -isystem bazel-out/ppc-opt/bin/external/boost/boost/detail -isystem external/boost/boost/preprocessor -isystem bazel-out/ppc-opt/bin/external/boost/boost/preprocessor -isystem external/boost/boost/io -isystem bazel-out/ppc-opt/bin/external/boost/boost/io -isystem external/boost/boost/iterator -isystem bazel-out/ppc-opt/bin/external/boost/boost/iterator -isystem external/boost/boost/utility -isystem bazel-out/ppc-opt/bin/external/boost/boost/utility -isystem external/boost/boost/swap -isystem bazel-out/ppc-opt/bin/external/boost/boost/swap -isystem external/boost/boost/range -isystem bazel-out/ppc-opt/bin/external/boost/boost/range -isystem external/boost/boost/array -isystem bazel-out/ppc-opt/bin/external/boost/boost/array -isystem external/boost/boost/throw_exception -isystem bazel-out/ppc-opt/bin/external/boost/boost/throw_exception -isystem external/boost/boost/current_function -isystem bazel-out/ppc-opt/bin/external/boost/boost/current_function -isystem external/boost/boost/exception -isystem bazel-out/ppc-opt/bin/external/boost/boost/exception -isystem external/boost/boost/concept_check -isystem bazel-out/ppc-opt/bin/external/boost/boost/concept_check -isystem external/boost/boost/concept -isystem bazel-out/ppc-opt/bin/external/boost/boost/concept -isystem external/boost/boost/concept_archetype -isystem bazel-out/ppc-opt/bin/external/boost/boost/concept_archetype -isystem external/boost/boost/noncopyable -isystem bazel-out/ppc-opt/bin/external/boost/boost/noncopyable -isystem external/boost/boost/optional -isystem bazel-out/ppc-opt/bin/external/boost/boost/optional -isystem external/boost/boost/none -isystem bazel-out/ppc-opt/bin/external/boost/boost/none -isystem external/boost/boost/type -isystem bazel-out/ppc-opt/bin/external/boost/boost/type -isystem external/boost/boost/ref -isystem bazel-out/ppc-opt/bin/external/boost/boost/ref -isystem external/boost/boost/regex -isystem bazel-out/ppc-opt/bin/external/boost/boost/regex -isystem external/boost/boost/cstdint -isystem bazel-out/ppc-opt/bin/external/boost/boost/cstdint -isystem external/boost/boost/predef -isystem bazel-out/ppc-opt/bin/external/boost/boost/predef -isystem external/boost/boost/smart_ptr -isystem bazel-out/ppc-opt/bin/external/boost/boost/smart_ptr -isystem external/boost/boost/align -isystem bazel-out/ppc-opt/bin/external/boost/boost/align -isystem external/boost/boost/scoped_array -isystem bazel-out/ppc-opt/bin/external/boost/boost/scoped_array -isystem external/boost/boost/checked_delete -isystem bazel-out/ppc-opt/bin/external/boost/boost/checked_delete -isystem external/boost/boost/scoped_ptr -isystem bazel-out/ppc-opt/bin/external/boost/boost/scoped_ptr -isystem external/boost/boost/shared_array -isystem bazel-out/ppc-opt/bin/external/boost/boost/shared_array -isystem external/boost/boost/shared_ptr -isystem bazel-out/ppc-opt/bin/external/boost/boost/shared_ptr -isystem external/boost/boost/tuple -isystem bazel-out/ppc-opt/bin/external/boost/boost/tuple -isystem external/boost/boost/system -isystem bazel-out/ppc-opt/bin/external/boost/boost/system -isystem external/boost/boost/cerrno -isystem bazel-out/ppc-opt/bin/external/boost/boost/cerrno -isystem external/com_github_google_flatbuffers/include -isystem bazel-out/ppc-opt/bin/external/com_github_google_flatbuffers/include -DARROW_USE_GLOG -Werror -w -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/plasma/cpp/src/plasma/client.cc -o bazel-out/ppc-opt/bin/external/plasma/_objs/plasma_client/client.pic.o)
Execution platform: @local_config_platform//:host
Use --sandbox_debug to see verbose messages from the sandbox
external/plasma/cpp/src/plasma/client.cc: In member function '__vector(4) __bool int plasma::PlasmaClient::Impl::IsInUse(const ObjectID&)':
external/plasma/cpp/src/plasma/client.cc:377:40: error: cannot convert 'bool' to '__vector(4) __bool int' in return
return (elem != objects_in_use_.end());
^
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::Create(const ObjectID&, int64_t, const uint8_t*, int64_t, std::shared_ptr<arrow::Buffer>*, int)':
external/plasma/cpp/src/plasma/client.cc:476:49: error: no matching function for call to 'plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, bool)'
IncrementObjectCount(object_id, &object, false);
^
external/plasma/cpp/src/plasma/client.cc:476:49: note: candidate is:
external/plasma/cpp/src/plasma/client.cc:391:6: note: void plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, __vector(4) __bool int)
void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id,
^
external/plasma/cpp/src/plasma/client.cc:391:6: note: no known conversion for argument 3 from 'bool' to '__vector(4) __bool int'
external/plasma/cpp/src/plasma/client.cc:481:49: error: no matching function for call to 'plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, bool)'
IncrementObjectCount(object_id, &object, false);
^
external/plasma/cpp/src/plasma/client.cc:481:49: note: candidate is:
external/plasma/cpp/src/plasma/client.cc:391:6: note: void plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, __vector(4) __bool int)
void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id,
^
external/plasma/cpp/src/plasma/client.cc:391:6: note: no known conversion for argument 3 from 'bool' to '__vector(4) __bool int'
In file included from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/atomic_base.h:36:0,
from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/atomic:41,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/memory_pool.h:21,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/buffer.h:28,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:26,
from external/plasma/cpp/src/plasma/client.cc:20:
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::GetBuffers(const ObjectID*, int64_t, int64_t, const std::function<std::shared_ptr<arrow::Buffer>(const plasma::UniqueID&, const std::shared_ptr<arrow::Buffer>&)>&, plasma::ObjectBuffer*)':
external/plasma/cpp/src/plasma/client.cc:546:22: error: cannot convert 'bool' to '__vector(4) __bool int' in initialization
bool all_present = true;
^
external/plasma/cpp/src/plasma/client.cc:552:19: error: cannot convert 'bool' to '__vector(4) __bool int' in assignment
all_present = false;
^
external/plasma/cpp/src/plasma/client.cc:553:39: error: could not convert 'object_entry.std::__detail::_Node_iterator<_Value, __constant_iterators, __cache>::operator-><std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >, false, true>()->std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >::second.std::unique_ptr<_Tp, _Dp>::operator-><plasma::ObjectInUseEntry, std::default_delete<plasma::ObjectInUseEntry> >()->plasma::ObjectInUseEntry::is_sealed' from '__vector(4) __bool int' to 'bool'
} else if (!object_entry->second->is_sealed) {
^
external/plasma/cpp/src/plasma/client.cc:553:39: error: in argument to unary !
external/plasma/cpp/src/plasma/client.cc:561:19: error: cannot convert 'bool' to '__vector(4) __bool int' in assignment
all_present = false;
^
external/plasma/cpp/src/plasma/client.cc:588:55: error: no matching function for call to 'plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*&, bool)'
IncrementObjectCount(object_ids[i], object, true);
^
external/plasma/cpp/src/plasma/client.cc:588:55: note: candidate is:
external/plasma/cpp/src/plasma/client.cc:391:6: note: void plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, __vector(4) __bool int)
void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id,
^
external/plasma/cpp/src/plasma/client.cc:391:6: note: no known conversion for argument 3 from 'bool' to '__vector(4) __bool int'
external/plasma/cpp/src/plasma/client.cc:592:18: error: could not convert 'all_present' from '__vector(4) __bool int' to 'bool'
if (all_present) {
^
external/plasma/cpp/src/plasma/client.cc:664:64: error: no matching function for call to 'plasma::PlasmaClient::Impl::IncrementObjectCount(__gnu_cxx::__alloc_traits<std::allocator<plasma::UniqueID> >::value_type&, plasma::PlasmaObject*&, bool)'
IncrementObjectCount(received_object_ids[i], object, true);
^
external/plasma/cpp/src/plasma/client.cc:664:64: note: candidate is:
external/plasma/cpp/src/plasma/client.cc:391:6: note: void plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, __vector(4) __bool int)
void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id,
^
external/plasma/cpp/src/plasma/client.cc:391:6: note: no known conversion for argument 3 from 'bool' to '__vector(4) __bool int'
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::Contains(const ObjectID&, __vector(4) __bool int*)':
external/plasma/cpp/src/plasma/client.cc:751:17: error: cannot convert 'int' to '__vector(4) __bool int' in assignment
*has_object = 1;
^
In file included from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/memory_pool.h:26:0,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/buffer.h:28,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:26,
from external/plasma/cpp/src/plasma/client.cc:20:
external/plasma/cpp/src/plasma/client.cc:761:80: error: cannot convert '__vector(4) __bool int*' to 'bool*' for argument '4' to 'arrow::Status plasma::ReadContainsReply(uint8_t*, size_t, plasma::ObjectID*, bool*)'
ReadContainsReply(buffer.data(), buffer.size(), &object_id2, has_object));
^
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/status.h:58:28: note: in definition of macro 'ARROW_RETURN_NOT_OK'
::arrow::Status __s = (status); \
^
external/plasma/cpp/src/plasma/client.cc:760:5: note: in expansion of macro 'RETURN_NOT_OK'
RETURN_NOT_OK(
^
In file included from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/atomic_base.h:36:0,
from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/atomic:41,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/memory_pool.h:21,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/buffer.h:28,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:26,
from external/plasma/cpp/src/plasma/client.cc:20:
external/plasma/cpp/src/plasma/client.cc: In member function '__vector(4) __bool int plasma::PlasmaClient::Impl::ComputeObjectHashParallel(plasma::XXH64_state_t*, const unsigned char*, int64_t)':
external/plasma/cpp/src/plasma/client.cc:814:10: error: cannot convert 'bool' to '__vector(4) __bool int' in return
return true;
^
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::Seal(const ObjectID&)':
external/plasma/cpp/src/plasma/client.cc:856:38: error: could not convert 'object_entry.std::__detail::_Node_iterator<_Value, __constant_iterators, __cache>::operator-><std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >, false, true>()->std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >::second.std::unique_ptr<_Tp, _Dp>::operator-><plasma::ObjectInUseEntry, std::default_delete<plasma::ObjectInUseEntry> >()->plasma::ObjectInUseEntry::is_sealed' from '__vector(4) __bool int' to 'bool'
if (object_entry->second->is_sealed) {
^
external/plasma/cpp/src/plasma/client.cc:861:35: error: cannot convert 'bool' to '__vector(4) __bool int' in assignment
object_entry->second->is_sealed = true;
^
In file included from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/util/compare.h:24:0,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/status.h:24,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/memory_pool.h:26,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/buffer.h:28,
from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:26,
from external/plasma/cpp/src/plasma/client.cc:20:
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::Abort(const ObjectID&)':
external/plasma/cpp/src/plasma/client.cc:880:38: error: could not convert 'object_entry.std::__detail::_Node_iterator<_Value, __constant_iterators, __cache>::operator-><std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >, false, true>()->std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >::second.std::unique_ptr<_Tp, _Dp>::operator-><plasma::ObjectInUseEntry, std::default_delete<plasma::ObjectInUseEntry> >()->plasma::ObjectInUseEntry::is_sealed' from '__vector(4) __bool int' to 'bool'
ARROW_CHECK(!object_entry->second->is_sealed)
^
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/util/macros.h:49:52: note: in definition of macro 'ARROW_PREDICT_TRUE'
#define ARROW_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
^
external/plasma/cpp/src/plasma/client.cc:880:3: note: in expansion of macro 'ARROW_CHECK'
ARROW_CHECK(!object_entry->second->is_sealed)
^
external/plasma/cpp/src/plasma/client.cc:880:38: error: in argument to unary !
ARROW_CHECK(!object_entry->second->is_sealed)
^
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/util/macros.h:49:52: note: in definition of macro 'ARROW_PREDICT_TRUE'
#define ARROW_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
^
external/plasma/cpp/src/plasma/client.cc:880:3: note: in expansion of macro 'ARROW_CHECK'
ARROW_CHECK(!object_entry->second->is_sealed)
^
external/plasma/cpp/src/plasma/client.cc: At global scope:
external/plasma/cpp/src/plasma/client.cc:1163:8: error: prototype for 'arrow::Status plasma::PlasmaClient::Contains(const ObjectID&, __vector(4) __bool int*)' does not match any in class 'plasma::PlasmaClient'
Status PlasmaClient::Contains(const ObjectID& object_id, bool* has_object) {
^
In file included from external/plasma/cpp/src/plasma/client.cc:20:0:
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:172:10: error: candidate is: arrow::Status plasma::PlasmaClient::Contains(const ObjectID&, bool*)
Status Contains(const ObjectID& object_id, bool* has_object);
^
external/plasma/cpp/src/plasma/client.cc:1211:6: error: prototype for '__vector(4) __bool int plasma::PlasmaClient::IsInUse(const ObjectID&)' does not match any in class 'plasma::PlasmaClient'
bool PlasmaClient::IsInUse(const ObjectID& object_id) {
^
In file included from external/plasma/cpp/src/plasma/client.cc:20:0:
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:296:8: error: candidate is: bool plasma::PlasmaClient::IsInUse(const ObjectID&)
bool IsInUse(const ObjectID& object_id);
^
Target //:ray_pkg failed to build
INFO: Elapsed time: 921.482s, Critical Path: 44.12s
INFO: 1704 processes: 1704 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
File "setup.py", line 205, in <module>
license="Apache 2.0")
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/root/.local/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 202, in run
self.run_command('build')
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "setup.py", line 96, in run
subprocess.check_call(command)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../build.sh', '-p', '/opt/rh/rh-python36/root/usr/bin/python3']' returned non-zero exit status 1.
I'm debugging this now.
I'm able to build ray-0.7.7 with bazel-1.1.0 now. Had to make some minor changes in the build script and bzl files in ray. I'm verifying my changes by triggering a fresh build in UBI 7.6 ppc64le. Once I'm done with that, I'll try building ray-0.8.0.
I'm able to build ray-0.7.7 as well as ray-0.8.1 in UBI 7.6 ppc64le container now. Had to changes to the build.sh file, bazel/ray_deps_setup.bzl file. And add a ppc specific patch in thirdparty/patches/. I've validated my changes on x86 too to make sure that my changes do not break that.
Thanks @xdever. Your build steps for arrow helped me to get rid of a major hurdle!
Thanks @felker for your inputs on bazel version.
@amitsadaphule that is great news! I independently got as far as you did a few comments ago by modifying
/home/kfelker/.cache/bazel/_bazel_kfelker/5f568a871ffef1dd98938c7174b4baa5/external/boost/BUILD.bazel
While I also ended up making the same edit:
BOOST_CTX_ASM_SOURCES = select({
":linux_arm": [
"libs/context/src/asm/jump_arm_aapcs_elf_gas.S",
"libs/context/src/asm/make_arm_aapcs_elf_gas.S",
"libs/context/src/asm/ontop_arm_aapcs_elf_gas.S",
],
":linux_ppc64le": [
"libs/context/src/asm/jump_ppc64_sysv_elf_gas.S",
"libs/context/src/asm/make_ppc64_sysv_elf_gas.S",
"libs/context/src/asm/ontop_ppc64_sysv_elf_gas.S",
],
":linux_x86_64": [
"libs/context/src/asm/jump_x86_64_sysv_elf_gas.S",
"libs/context/src/asm/make_x86_64_sysv_elf_gas.S",
"libs/context/src/asm/ontop_x86_64_sysv_elf_gas.S",
],
":osx_x86_64": [
"libs/context/src/asm/jump_x86_64_sysv_macho_gas.S",
"libs/context/src/asm/make_x86_64_sysv_macho_gas.S",
"libs/context/src/asm/ontop_x86_64_sysv_macho_gas.S",
],
":windows_x86_64": [
"libs/context/src/asm/make_x86_64_ms_pe_masm.S",
"libs/context/src/asm/jump_x86_64_ms_pe_masm.S",
"libs/context/src/asm/ontop_x86_64_ms_pe_masm.S",
],
})
I also added linux_ppc64le
entries to 4x more fields instead of your L1603 edit. This might be due to a different version of Boost that I am using.
Still, it is good to validate that we were both on the same track, and I was able to get to the same plasma build error. How did you fix that @amitsadaphule ? I am still stuck there.
I have installed Pyarrow (tried v0.15.1 and v0.14.1) and the Arrow C++ library from the IBM WMLCE Conda channel
conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda pyarrow arrow-cpp
instead of building the .whl
from the source via @xdever's script. I tried converting the Conda installed package into a wheel and placing it within the Ray source code, but I could not figure out if it is possible.
I know very little about Bazel , but it appears that our edits to Boost's BUILD.bazel
might no longer be necessary with the latest versions: https://github.com/nelhage/rules_boost/pull/156 (specifically https://github.com/nelhage/rules_boost/commit/ebd1dd7d70d9a9c076a9d4e6d317c3236f303b66#diff-92e0042c94ad14988fb54ffb61047f5c)
@felker I was able to get past the plasma build error by adding "--cxxopt=-std=gnu++0x" option to the bazel build command. Also, about nelhage/rules_boost@ebd1dd7, I've used a patch with a subset of those changes during ray 0.7.7 and 0.8.1 build. The reason being ray is using an old commit from https://github.com/nelhage/rules_boost repo.
Great! I will try that out. Can you share the patch?
I added a Travis CI ppc64le
build to their repo in https://github.com/nelhage/rules_boost/pull/161 and this basic example could help for building a similar pipeline for this project.
I got it working with Ray 0.8.0 with the following changes:
diff --git a/bazel/ray_deps_setup.bzl b/bazel/ray_deps_setup.bzl
index 4a5fb69c..0531af5c 100644
--- a/bazel/ray_deps_setup.bzl
+++ b/bazel/ray_deps_setup.bzl
@@ -121,9 +121,14 @@ def ray_deps_setup():
github_repository(
name = "com_github_nelhage_rules_boost",
# If you update the Boost version, remember to update the 'boost' rule.
- commit = "df908358c605a7d5b8bbacde07afbaede5ac12cf",
+ commit = "67ddc505bc5ad5a15562f07e4954ac2011177e13",
+ #commit = "df908358c605a7d5b8bbacde07afbaede5ac12cf",
remote = "https://github.com/nelhage/rules_boost",
- sha256 = "3775c5ab217e0c9cc380f56e243a4d75fe6fee8eaee1447899eaa04c5d582cf1",
+ # sha256 = "3775c5ab217e0c9cc380f56e243a4d75fe6fee8eaee1447899eaa04c5d582cf1",
patches = [
"//thirdparty/patches:rules_boost-undefine-boost_fallthrough.patch",
],
diff --git a/build.sh b/build.sh
index 2a061bb3..08f8ae4d 100755
--- a/build.sh
+++ b/build.sh
@@ -123,7 +123,8 @@ if [ "$RAY_BUILD_JAVA" == "YES" ]; then
fi
if [ "$RAY_BUILD_PYTHON" == "YES" ]; then
- "$BAZEL_EXECUTABLE" build //:ray_pkg --verbose_failures
+ "$BAZEL_EXECUTABLE" build //:ray_pkg --verbose_failures --cxxopt=-std=gnu++0x
fi
popd
where I specified the latest master
revision of https://github.com/nelhage/rules_boost/commit/67ddc505bc5ad5a15562f07e4954ac2011177e13
(not sure how to update the sha256
field in ray_deps_setup.bzl
).
After these changes (and the above mentioned steps to get Bazel and Pyarrow), python3 setup.py bdist_wheel
successfully built a 14M Ray wheel /home/kfelker/ray_build/ray/python/dist/ray-0.8.0-cp36-cp36m-linux_ppc64le.whl
.
Thanks for your help @amitsadaphule !
That's great @felker ! Good that it's building properly with master from https://github.com/nelhage/rules_boost/commit/67ddc505bc5ad5a15562f07e4954ac2011177e13 and no patch is needed to be applied to it from ray. I'll try that as well when I try to build ray master.
Also, could you check if you're able to execute the unit tests? For me, it is complaining about a couple of missing packages and most of those (grpc, tensorflow) are not readily available for ppc64le. Here's the log:
[root@1613428f0d4a python]# pytest ray/tests/
========================================================================= test session starts =========================================================================
platform linux -- Python 3.6.9, pytest-5.3.4, py-1.8.1, pluggy-0.13.1
rootdir: /root/ray_build/ray/python
collected 459 items / 4 errors / 455 selected
=============================================================================== ERRORS ================================================================================
______________________________________________________________ ERROR collecting ray/tests/test_cython.py ______________________________________________________________
ImportError while importing test module '/root/ray_build/ray/python/ray/tests/test_cython.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ray/tests/test_cython.py:9: in <module>
import cython_examples as cyth
E ModuleNotFoundError: No module named 'cython_examples'
________________________________________________________ ERROR collecting ray/tests/test_memory_scheduling.py _________________________________________________________
ImportError while importing test module '/root/ray_build/ray/python/ray/tests/test_memory_scheduling.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ray/tests/test_memory_scheduling.py:5: in <module>
from ray import tune
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/__init__.py:2: in <module>
from ray.tune.tune import run_experiments, run
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/tune.py:7: in <module>
from ray.tune.analysis import ExperimentAnalysis
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/analysis/__init__.py:1: in <module>
from ray.tune.analysis.experiment_analysis import ExperimentAnalysis, Analysis
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/analysis/experiment_analysis.py:14: in <module>
from ray.tune.trial import Trial
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trial.py:13: in <module>
from ray.tune.durable_trainable import DurableTrainable
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/durable_trainable.py:3: in <module>
from ray.tune.trainable import Trainable, TrainableUtil
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trainable.py:9: in <module>
import pandas as pd
E ModuleNotFoundError: No module named 'pandas'
_____________________________________________________________ ERROR collecting ray/tests/test_metrics.py ______________________________________________________________
ImportError while importing test module '/root/ray_build/ray/python/ray/tests/test_metrics.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ray/tests/test_metrics.py:2: in <module>
import grpc
E ModuleNotFoundError: No module named 'grpc'
____________________________________________________________ ERROR collecting ray/tests/test_tensorflow.py ____________________________________________________________
ImportError while importing test module '/root/ray_build/ray/python/ray/tests/test_tensorflow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ray/tests/test_tensorflow.py:2: in <module>
import tensorflow.compat.v1 as tf
E ModuleNotFoundError: No module named 'tensorflow'
========================================================================== warnings summary ===========================================================================
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.asyncio - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
PytestUnknownMarkWarning,
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.timeout - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
PytestUnknownMarkWarning,
ray/tests/test_object_manager.py:20
/root/ray_build/ray/python/ray/tests/test_object_manager.py:20: UserWarning: This test must be run on large machines.
warnings.warn("This test must be run on large machines.")
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.benchmark - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
PytestUnknownMarkWarning,
-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==================================================================== 5 warnings, 4 errors in 1.07s ====================================================================
Both TensorFlow and gRPC are available on IBM's Conda channel, e.g.:
I also had to install the Cython examples from the Ray documentation folder, but then I was able to run the (failing) tests:
cd /home/kfelker/ray_build/ray/doc/examples/cython/
python setup.py install
cd ../../..
pytest python/ray/tests
The output I got was:
(frnn) (base) ➜ ray git:(0afe14f2) ✗ pytest python/ray/tests
============================================================================== test session starts ===============================================================================
platform linux -- Python 3.6.8, pytest-4.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /home/kfelker/ray_build/ray/python
plugins: hypothesis-3.59.1
collected 428 items
python/ray/tests/py3_test.py ....(pid=53609) *** Aborted at 1580508782 (unix time) try "date -d @1580508782" if you are using GNU date ***
(pid=53609) PC: @ 0x0 (unknown)
(pid=53609) *** SIGSEGV (@0x7fffa92558c0) received by PID 53609 (TID 0x7fffa8bd4f20) from PID 18446744072252381376; stack trace: ***
(pid=53609) @ 0x7fffa8b704d8 ([vdso]+0x4d7)
(pid=53609) @ 0x0 (unknown)
(pid=53609) @ 0x7fffa140c4b0 _ZN5boost7context6detail11fiber_ontopINS0_5fiberEZNS_6fibers7context6resumeEPS5_EUlOS3_E_EENS1_10transfer_tES9_
(pid=53609) @ 0x7fffa140a5c8 boost::context::detail::fiber_entry<>()
(pid=53609) @ 0x7fffa1410090 make_fcontext
F(pid=55779) *** Aborted at 1580508803 (unix time) try "date -d @1580508803" if you are using GNU date ***
(pid=55779) PC: @ 0x0 (unknown)
(pid=55779) *** SIGSEGV (@0x7fffb23c58c0) received by PID 55779 (TID 0x7fffb1d44f20) from PID 18446744072404883648; stack trace: ***
(pid=55779) @ 0x7fffb1ce04d8 ([vdso]+0x4d7)
(pid=55779) @ 0x0 (unknown)
(pid=55779) @ 0x7fffaa57c4b0 _ZN5boost7context6detail11fiber_ontopINS0_5fiberEZNS_6fibers7context6resumeEPS5_EUlOS3_E_EENS1_10transfer_tES9_
(pid=55779) @ 0x7fffaa57a5c8 boost::context::detail::fiber_entry<>()
(pid=55779) @ 0x7fffaa580090 make_fcontext
FF [ 1%]
python/ray/tests/test_actor.py .s............................sssssss....s..... [ 12%]
python/ray/tests/test_actor_failures.py .FFF.[1] 48213 abort pytest python/ray/tests
Not sure how problematic this is; I have been using Ray successfully in some limited cases over the last day.
@felker Sorry I just saw your message from above! On Python >= 3.6, Ray should be able to run without our custom version of pyarrow, and we are working towards removing that as a built in dependeny, so you shouldn't need to get that working on power pc :)
@pcmoritz fortunately, I was able install pyarrow from IBM's Conda channel and use export SKIP_PYARROW_INSTALL=1
to bypass the custom pyarrow stuff.
Still, Travis CI ppc64le
containers could be useful to build and deploy Ray wheels for that architecture.
@felker Great to hear that! You should free to create a PR that adds the ppc64le build to the Ray matrix to build the wheels!
Thanks for everyone! I'm glad to see this great progress. However, @felker be aware that in order to use ray on a mixed architecture cluster, you have to have exactly the same version of pyarrow on all of them, including the ".RAY" at the end of the version number. So probably the IBM Conda version would work only in case every machine is PPC64. Alternatively, as a hack, one could remove on the head node the version checks in services.py, function check_version_info, and hope that it will work.
Thanks @felker for the quick reply on unit tests! I'll try it out.
@felker did you face a dependency issue with opencv-python while executing the tests? I couldn't find that through pip3 on UBI 7 ppc64le. I tried installing an rpm from http://mirror.centos.org/altarch/7/os/ppc64le/Packages/opencv-python-2.4.5-3.el7.ppc64le.rpm by resolving the necessary dependencies, but that didn't help either. Currently trying to build from source by following instructions from https://github.com/skvark/opencv-python.
py-opencv
and opencv
are both available on WMLCE Conda channel for Python 3.6 and 3.7, e.g.: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le/py-opencv-3.4.7-py37_725.g92aa195.tar.bz2
Thanks @felker! I'm trying to avoid switching to conda. I'll try to resolve the build issues for opencv-python first. If that doesn't not work out, the packages from conda channels that you suggested will help me get this done anyway.
I was finally able to get opencv-python-headless and tensorflow 2.0.1 built and installed. I executed the tests on UBI 7.6 ppc64le and got the following result:
[root@c03c2e5fb315 ray]# pytest python/ray/tests/
============================================================================= test session starts ==============================================================================
platform linux -- Python 3.6.9, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /root/ray_build/ray/python
collected 480 items
python/ray/tests/py3_test.py ....FFFFE [ 1%]
python/ray/tests/test_actor.py EEEEEEEEF.......................sssss....s....... [ 12%]
python/ray/tests/test_actor_failures.py FF.FFFFFFF....s [ 15%]
python/ray/tests/test_actor_pool.py ...... [ 16%]
python/ray/tests/test_actor_resources.py ...s.......... [ 19%]
python/ray/tests/test_advanced.py ......s...s.. [ 22%]
python/ray/tests/test_advanced_2.py ............... [ 25%]
python/ray/tests/test_advanced_3.py ...s..........s..........Fs [ 30%]
python/ray/tests/test_array.py ... [ 31%]
python/ray/tests/test_autoscaler.py .................................. [ 38%]
python/ray/tests/test_autoscaler_yaml.py F [ 38%]
python/ray/tests/test_basic.py .............s................................. [ 48%]
python/ray/tests/test_component_failures.py F.F. [ 49%]
python/ray/tests/test_component_failures_2.py .... [ 50%]
python/ray/tests/test_component_failures_3.py ... [ 50%]
python/ray/tests/test_cython.py ... [ 51%]
python/ray/tests/test_debug_tools.py F [ 51%]
python/ray/tests/test_dynres.py .............. [ 54%]
python/ray/tests/test_failure.py .............ss...s............Terminated
There are quite a few test failures and the test execution gets terminated at 54%. Just to check parity, I built the code for 0.8.1 on UBI 7.6 x86 as well. There too I got similar result in terms of test failures and the test execution was killed at 54%. Here's the log:
[root@0ac385b5dd90 ray]# pytest python/ray/tests
============================================================================= test session starts ==============================================================================
platform linux -- Python 3.6.9, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /root/ray_build/ray/python
collected 480 items
python/ray/tests/py3_test.py ........E [ 1%]
python/ray/tests/test_actor.py EEEEEEEEF..................F....sssss....s....... [ 12%]
python/ray/tests/test_actor_failures.py FF.F.FFFFF....s [ 15%]
python/ray/tests/test_actor_pool.py ...... [ 16%]
python/ray/tests/test_actor_resources.py ...s...F...... [ 19%]
python/ray/tests/test_advanced.py ......s...s.. [ 22%]
python/ray/tests/test_advanced_2.py ............... [ 25%]
python/ray/tests/test_advanced_3.py ...s..........s..........Fs [ 30%]
python/ray/tests/test_array.py ... [ 31%]
python/ray/tests/test_autoscaler.py .................................. [ 38%]
python/ray/tests/test_autoscaler_yaml.py F [ 38%]
python/ray/tests/test_basic.py .............s................................. [ 48%]
python/ray/tests/test_component_failures.py FFFF [ 49%]
python/ray/tests/test_component_failures_2.py .... [ 50%]
python/ray/tests/test_component_failures_3.py ... [ 50%]
python/ray/tests/test_cython.py ... [ 51%]
python/ray/tests/test_debug_tools.py F [ 51%]
python/ray/tests/test_dynres.py .............. [ 54%]
python/ray/tests/test_failure.py .............ss...s............Killed
I need to investigate this further.
I'm able to build ray-0.7.7 as well as ray-0.8.1 in UBI 7.6 ppc64le container now. Had to changes to the build.sh file, bazel/ray_deps_setup.bzl file. And add a ppc specific patch in thirdparty/patches/. I've validated my changes on x86 too to make sure that my changes do not break that.
@amitsadaphule Could you pls share your changes with me..?
Following up on the work of others in this thread, I was able to install ray on ARM64 (aarch64) and even attempt at mixed architecture (ARM64<->x86_64) distributed computing.
With the latest commits, PyArrow dependency has been removed even though pyarrow on ARM was not a problem to me as I've been using it regularly with other projects.
The issue with building Ray on ARM64 was with bazel rules for boost libraries, which is unfortunate as boost for ARM doesn't have any issues as such, but the build rules for ARM64 contains errors as detailed in https://github.com/ray-project/ray/issues/7184.
I've made a patch to address that and written the procedure to build and install ray on ARM64.
My patch may break dependency for ARMv7 as I've not figured out how to include files for both ARM32 and ARM64 under linux_arm
bazel build rule. If anyone can advise me on how to achieve that, I will update the patch and submit a PR.
I've opened an issue on the same at nelhage/rules_boost from where the file is obtained during the ray build process.
Update: linux_aarch64
constraint added to the BUILD.boost can fix the above issues without removing compatibility with linux_arm
. PR has been submitted to upstream here - https://github.com/nelhage/rules_boost/pull/168.
@JasonWayne please find the buildscript here. Please note that the script builds ray with python 3.7.3, since that was specific requirement in my case. You can use python 3.6 instead by installing rh-python36 and replace all occurrences of python3.7 with python3.6 and those of pip3.7 with pip3.6.
Following up on the test failures, post building ray 0.7.7 on RHEL 7.6 ppc64le and installing all test execution dependencies, when I tried to execute the test cases as pytest -v python/ray/tests/
, I am experiencing occasional pytest freeze at random test cases ranging from 8% to 50% and test case execution termination at random tests ranging from 57-64%. Just to check parity, I tried the build and test execution on x86 RHEL 7.6 as well. I'm facing similar issues there too.
@felker did you get around those test case execution issues?
Has anyone else experienced this before? Is there some known solution to these problems?
No, I have not tried to get the tests to pass since https://github.com/ray-project/ray/issues/4309#issuecomment-580940094
I managed to build ray successfully for tag v8.3.0, with boost patches @heavyinfo provided. For some reasons the patches are still needed. I am able to build it for both armv7l and aarch64.
@felker did you face a dependency issue with opencv-python while executing the tests? I couldn't find that through pip3 on UBI 7 ppc64le. I tried installing an rpm from http://mirror.centos.org/altarch/7/os/ppc64le/Packages/opencv-python-2.4.5-3.el7.ppc64le.rpm by resolving the necessary dependencies, but that didn't help either. Currently trying to build from source by following instructions from https://github.com/skvark/opencv-python.
I hade the same issue. Did you find a way to install it?
I hade the same issue. Did you find a way to install it?
Yes, hope this one helps: https://github.com/ppc64le/build-scripts/tree/master/pip-ray
I suppose i cannot run it without administrator privileges, that I don't have...
I changed the title to be about ppc64le. Is the goal to provide a ppc64le wheel, conda package, or to provide clear instructions how to build for ppc64le?
Yes, a conda package would be perfect
In order to provide ppc64le and aarch64 builds on conda, all the dependencies must be available. The rllib component depends on gym, which in turn depends on pygame (for Box2d) and ale-py (the Arcade Learning Environment). See the conda-forge PR for more information. Either the gym dependency should be made optional for rllib, or someone needs to put in the time to package those two libraries for gym so that gym 0.22+ can be built for conda-forge. Once that happens, migrating the packages to ppc64le and aarch64 should follow.
System information
Describe the problem
Build fails on non-x86 architectures, because recently binary installation of pyarrow is added to build.sh, but they are available only for x86_64.
Source code / logs