ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.94k stars 5.58k forks source link

Build fails on ppc64le architecture #4309

Open RobertCsordas opened 5 years ago

RobertCsordas commented 5 years ago

System information

Describe the problem

Build fails on non-x86 architectures, because recently binary installation of pyarrow is added to build.sh, but they are available only for x86_64.

Source code / logs

pcmoritz commented 5 years ago

Hey @xdever,

Going forward, this will need some way to build pyarrow wheels for ppc64. The official way to build pyarrow wheels is through crossbow (see https://github.com/apache/arrow/tree/master/dev/tasks), we reuse most of the these scripts to build the pyarrow wheels, see https://github.com/pcmoritz/arrow-build/blob/master/.travis.yml#L54.

This infrastructure uses travis, so it won't work out of the box, but it is easy to run the scripts on a dedicated machine. If you follow the instructions https://github.com/apache/arrow/tree/master/python/manylinux1 on your ppc machine, it shouldn't be too hard to build the wheels (everything is dockerized). Once you have the wheels, you can replace the pip install in https://github.com/ray-project/ray/blob/dec7c3f8f5cfdd74b6f824307ec60d9e55232a60/build.sh#L121 with a pip install that installs your power pc wheels and you are good to go.

Let me know if you have questions about this or run into trouble!

Best wishes, Philipp.

RobertCsordas commented 5 years ago

Hi @pcmoritz,

Wouldn't it make sene to for you to provide pyarrow cross compiled to PPC64? Probably I'm not the only one who whats to use ray on IBM Minsky, which is a PPC64LE. The build process is nontrivial, and I'm afraid it would prevent many people from using it. If you don't want to bother with the PPC binaries, and don't want to keep the cmake build script in the build.sh, it would be good if it could be moved to a different script or different repository, in order for people still able to build it without too much effort.

Thank you, Robert

RobertCsordas commented 4 years ago

Hi all,

Any progress on this?

Building PyArrow is the most difficult thing I have ever seen on Linux... At least could you somehow provide the old script that was used to auto-build it?

Thank you, Robert

felker commented 4 years ago

+1

I am working with 2x IBM AC922 systems, and cannot build Ray from source on them.

RobertCsordas commented 4 years ago

I managed to get Ray 0.7.5 working fine (even with the rest of the cluster which is x86). For a newer version, you should change the versions and commit numbers in the script. It was super difficult and took me a few days to make it work, so I made a script out of it to be able to reproduce it next time.

#!/bin/bash

mkdir ~/ray_build
cd ~/ray_build

mkdir bazel_build
cd bazel_build
wget https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-dist.zip
unzip bazel*
env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh
cd output
export PATH=`pwd`:$PATH
cd ../../

git clone --recursive https://github.com/apache/arrow
cd arrow
git checkout 141a213a54f4979ab0b94b94928739359a2ee9ad
#git checkout tags/apache-arrow-0.14.0
git submodule update --recursive
mkdir build
cd build

cmake ../cpp -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_INSTALL_PREFIX=~/ray_build/arrow -DCMAKE_C_FLAGS=-O3 -DCMAKE_CXX_FLAGS=-O3 -DARROW_BUILD_TESTS=off -DARROW_HDFS=on -DARROW_BOOST_USE_SHARED=off -DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python3 -DARROW_PYTHON=on -DARROW_PLASMA=on -DARROW_TENSORFLOW=off -DARROW_JEMALLOC=off -DARROW_WITH_BROTLI=off -DARROW_WITH_LZ4=on -DARROW_WITH_ZSTD=off -DARROW_WITH_THRIFT=ON -DARROW_PARQUET=ON -DARROW_WITH_ZLIB=ON

make -j`nproc`
make install

cd ../python
export PKG_CONFIG_PATH=~/ray_build/arrow/lib/pkgconfig:$PKG_CONFIG_PATH
export PYARROW_BUILD_TYPE='release'
export PYARROW_WITH_ORC=0
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_PLASMA=1
export PYARROW_BUNDLE_ARROW_CPP=1
#export PYARROW_BUNDLE_BOOST=1
#export PYARROW_BOOST_NAMESPACE=arrow_boost

pip3 install -r requirements-wheel.txt --user
SETUPTOOLS_SCM_PRETEND_VERSION="0.14.0.RAY" python3 setup.py build_ext --inplace
SETUPTOOLS_SCM_PRETEND_VERSION="0.14.0.RAY" python3 setup.py bdist_wheel

cp dist/pyarrow*.whl ~/ray_build

cd ../../

git clone --recursive https://github.com/ray-project/ray
cd ray
git checkout tags/ray-0.7.5
git submodule update --recursive
export SKIP_PYARROW_INSTALL=1
cd python
python3 -m pip install -q --target ray/pyarrow_files ~/ray_build/pyarrow*.whl  --system

python3 setup.py bdist_wheel
amitsadaphule commented 4 years ago

@xdever Thanks for the detailed steps! I tried to build ray-0.7.5 with the above steps. But the ray build fails with the following error:

+ /root/ray_build/bazel_build/output/bazel build //:ray_pkg --verbose_failures
INFO: Call stack for the definition of repository 'com_github_jupp0r_prometheus_cpp' which is a http_archive (rule definition at /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/bazel_tools/tools/build_defs/repo/http.bzl:229:16):
 - /root/ray_build/ray/bazel/ray_deps_setup.bzl:96:5
 - /root/ray_build/ray/WORKSPACE:5:1
ERROR: An error occurred during the fetch of repository 'com_github_jupp0r_prometheus_cpp':
   java.io.IOException: Error downloading [https://github.com/jovany-wang/prometheus-cpp/archive/master.zip] to /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/com_github_jupp0r_prometheus_cpp/master.zip: GET returned 404 Not Found
INFO: Call stack for the definition of repository 'build_stack_rules_proto' which is a http_archive (rule definition at /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/bazel_tools/tools/build_defs/repo/http.bzl:229:16):
 - /root/ray_build/ray/bazel/ray_deps_setup.bzl:113:5
 - /root/ray_build/ray/WORKSPACE:5:1
ERROR: error loading package '': in /root/ray_build/ray/bazel/ray_deps_build_all.bzl: Encountered error while reading extension file 'repositories.bzl': no such package '@com_github_jupp0r_prometheus_cpp//': java.io.IOException: Error downloading [https://github.com/jovany-wang/prometheus-cpp/archive/master.zip] to /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/com_github_jupp0r_prometheus_cpp/master.zip: GET returned 404 Not Found
ERROR: error loading package '': in /root/ray_build/ray/bazel/ray_deps_build_all.bzl: Encountered error while reading extension file 'repositories.bzl': no such package '@com_github_jupp0r_prometheus_cpp//': java.io.IOException: Error downloading [https://github.com/jovany-wang/prometheus-cpp/archive/master.zip] to /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/com_github_jupp0r_prometheus_cpp/master.zip: GET returned 404 Not Found
INFO: Elapsed time: 11.948s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)

Do these steps still work for you? Or did you have to make some changes to the script recently?

Also, I did some searches for that error and found this thread: https://github.com/ray-project/ray/issues/6373. So, I tried building ray-0.8.0 instead. That did not complain about the "prometheus-cpp" download failure. But it gave the following error:

+ /root/ray_build/bazel_build/output/bazel build //:ray_pkg --verbose_failures
    INFO: Options provided by the client:
      Inherited 'common' options: --isatty=0 --terminal_columns=80
    INFO: Reading rc options for 'build' from /root/ray_build/ray/.bazelrc:
      'build' options: --compilation_mode=opt --action_env=BAZEL_LLVM --action_env=BAZEL_SH --action_env=PATH --action_env=PYTHON2_BIN_PATH --action_env=PYTHON3_BIN_PATH --action_env=USE_CLANG_CL=1 --enable_platform_specific_config --per_file_copt=-\.(asm|S)$,-.*/arrow/util/logging\.cc@-Werror --per_file_copt=-\.(asm|S)$,\.pb\.cc$@-w --per_file_copt=-\.(asm|S)$,external/.*@-w --host_copt=-Wno-builtin-macro-redefined --host_copt=-Wno-inconsistent-missing-override --host_copt=-Wno-microsoft-unqualified-friend --per_file_copt=-\.(asm|S)$,external/com_github_grpc_grpc/.*@-DGRPC_BAZEL_BUILD --http_timeout_scaling=5.0 --incompatible_depset_is_not_iterable=false
    ERROR: Unrecognized option: --enable_platform_specific_config

Do you think maybe this issue could be due to an old version of bazel? Should I try to build a newer version of bazel?

felker commented 4 years ago

Same problem here, seems that https://github.com/jovany-wang/prometheus-cpp no longer exists.

felker commented 4 years ago

@amitsadaphule

The error you are seeing when trying to build ray-0.8.0 is indeed due to Bazel v0.26.1 being too old. The --enable_platform_specific_config option wasn't added to Bazel until v1.0.0, specifically in this commit: https://github.com/bazelbuild/bazel/commit/59755455034a998cdedfb7b086aea3ad78419381

I edited @xdever's script again to bump the version, but I get a new error:

...
+ popd
~/ray_build/ray/build ~/ray_build/ray/python
+ export PYTHON3_BIN_PATH=/home/kfelker/.conda/envs/frnn/bin/python3
+ PYTHON3_BIN_PATH=/home/kfelker/.conda/envs/frnn/bin/python3
+ export PYTHON2_BIN_PATH=/home/kfelker/.conda/envs/frnn/bin/python3
+ PYTHON2_BIN_PATH=/home/kfelker/.conda/envs/frnn/bin/python3
+ '[' NO == YES ']'
+ '[' YES == YES ']'
+ /home/kfelker/bin/bazel build //:ray_pkg --verbose_failures
/home/kfelker/bin/bazel: line 89: /home/kfelker/.bazel/bin/bazel-real: cannot execute binary file
/home/kfelker/bin/bazel: line 89: /home/kfelker/.bazel/bin/bazel-real: Success
Traceback (most recent call last):
  File "setup.py", line 210, in <module>
    license="Apache 2.0")
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 192, in run
    self.run_command('build')
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 99, in run
    subprocess.check_call(command)
  File "/home/kfelker/.conda/envs/frnn/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../build.sh', '-p', '/home/kfelker/.conda/envs/frnn/bin/python3']' returned non-zero exit status 1.
amitsadaphule commented 4 years ago

@felker I tried installing bazel from yum which installed bazel 1.2.1 in ubi 7.6 along with java-11-openjdk-11.0.6.10-1.el7_7 and setting JAVA_HOME with the following commands:

wget -O /etc/yum.repos.d/vbatts-bazel-epel-7.repo https://copr.fedorainfracloud.org/coprs/vbatts/bazel/repo/epel-7/vbatts-bazel-epel-7.repo
yum install -y bazel
export JAVA_HOME=$(compgen -G '/usr/lib/jvm/java-11-openjdk-*')

That made the build proceed a bit further and then fail with the following error:

Starting local Bazel server and connecting to it...
ERROR: /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/boost/BUILD.bazel:116:1: Configurable attribute "srcs" doesn't match this configuration (would a default condition help?).
Conditions checked:
 @boost//:linux_arm
 @boost//:linux_x86_64
 @boost//:osx_x86_64
 @boost//:windows_x86_64
INFO: Call stack for the definition of repository 'boringssl' which is a http_archive (rule definition at /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/bazel_tools/tools/build_defs/repo/http.bzl:292:16):
 - /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/com_github_grpc_grpc/bazel/grpc_deps.bzl:100:9
 - /root/ray_build/ray/bazel/ray_deps_build_all.bzl:16:3
 - /root/ray_build/ray/WORKSPACE:9:1
ERROR: Analysis of target '//:ray_pkg' failed; build aborted:

/root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/boost/BUILD.bazel:116:1: Configurable attribute "srcs" doesn't match this configuration (would a default condition help?).
Conditions checked:
 @boost//:linux_arm
 @boost//:linux_x86_64
 @boost//:osx_x86_64
 @boost//:windows_x86_64
INFO: Elapsed time: 240.991s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (57 packages loaded, 5577 targets configured)

I'll see if I can resolve this error.

felker commented 4 years ago

I also was able to bootstrap Bazel v1.1.0 on ppc64le by following the instructions here: https://openpower.ic.unicamp.br/post/installing-bazel-power-other-architectures-systems/

and noticing that user clnperez fixed the Bazel build process for Power only by that version: https://github.com/bazelbuild/bazel/pull/9346 via two commits: https://github.com/bazelbuild/bazel/commit/5cff4f1edf8b95bf0612791632255852332f72b5 https://github.com/bazelbuild/bazel/commit/27612bb1f6131cd86b42306c80037946b686c9c7

After a few more hiccups (using an old CMake < v3.x, not having Boost installed for Arrow), I have also gotten as far as you but am stuck again.

felker commented 4 years ago

@pcmoritz Travis CI now supports ppc64le jobs within LXD containers for open source projects: https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z

Would it be easy for you to add it to this to your current build matrix? I am trying to set it up on a forked version of https://github.com/ray-project/arrow-build but it is challenging

amitsadaphule commented 4 years ago

@felker Have you had any luck with the ray build? I tried with different bazel versions. With 1.2.1 and 1.0.0, I have the same observations. Not sure whether this is an issue with bazel or the build environment.

amitsadaphule commented 4 years ago

I had misinterpreted the errors seen earlier. They were in boost's .bazel file. The issue was not with boringssl. The following changes in /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/boost/BUILD.bazel made the build proceed further:

  1. On line 1603, add the following in "defines = select({": ":linux": [],

  2. On line 99 in BOOST_CTX_ASM_SOURCES, add the following: ":linux": [ "libs/context/src/asm/jump_ppc64_sysv_elf_gas.S", "libs/context/src/asm/make_ppc64_sysv_elf_gas.S", "libs/context/src/asm/ontop_ppc64_sysv_elf_gas.S", ],

Now, there is a compilation error in building plasma as shown below:

ERROR: /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/plasma/BUILD.bazel:70:1: C++ compilation of rule '@plasma//:plasma_client' failed (Exit 1) gcc failed: error executing command
  (cd /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/sandbox/processwrapper-sandbox/1699/execroot/com_github_ray_project_ray && \
  exec env - \
    LD_LIBRARY_PATH=/opt/rh/rh-python36/root/usr/lib64 \
    PATH=/root/ray_build/bazel_build/output:/opt/rh/rh-python36/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/go/bin \
    PWD=/proc/self/cwd \
    PYTHON2_BIN_PATH=/opt/rh/rh-python36/root/usr/bin/python3 \
    PYTHON3_BIN_PATH=/opt/rh/rh-python36/root/usr/bin/python3 \
    USE_CLANG_CL=1 \
  /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/ppc-opt/bin/external/plasma/_objs/plasma_client/client.pic.d '-frandom-seed=bazel-out/ppc-opt/bin/external/plasma/_objs/plasma_client/client.pic.o' -fPIC -DBOOST_FALLTHROUGH -iquote external/plasma -iquote bazel-out/ppc-opt/bin/external/plasma -iquote external/boost -iquote bazel-out/ppc-opt/bin/external/boost -iquote external/com_github_google_glog -iquote bazel-out/ppc-opt/bin/external/com_github_google_glog -iquote external/com_github_gflags_gflags -iquote bazel-out/ppc-opt/bin/external/com_github_gflags_gflags -iquote external/com_github_google_flatbuffers -iquote bazel-out/ppc-opt/bin/external/com_github_google_flatbuffers -Ibazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client -Ibazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow -Ibazel-out/ppc-opt/bin/external/com_github_google_glog/_virtual_includes/default_glog_headers -Ibazel-out/ppc-opt/bin/external/com_github_gflags_gflags/_virtual_includes/gflags -isystem external/boost -isystem bazel-out/ppc-opt/bin/external/boost -isystem external/boost/boost/filesystem -isystem bazel-out/ppc-opt/bin/external/boost/boost/filesystem -isystem external/boost/boost/config -isystem bazel-out/ppc-opt/bin/external/boost/boost/config -isystem external/boost/boost/version -isystem bazel-out/ppc-opt/bin/external/boost/boost/version -isystem external/boost/boost/functional -isystem bazel-out/ppc-opt/bin/external/boost/boost/functional -isystem external/boost/boost/container_hash -isystem bazel-out/ppc-opt/bin/external/boost/boost/container_hash -isystem external/boost/boost/assert -isystem bazel-out/ppc-opt/bin/external/boost/boost/assert -isystem external/boost/boost/core -isystem bazel-out/ppc-opt/bin/external/boost/boost/core -isystem external/boost/boost/integer -isystem bazel-out/ppc-opt/bin/external/boost/boost/integer -isystem external/boost/boost/static_assert -isystem bazel-out/ppc-opt/bin/external/boost/boost/static_assert -isystem external/boost/boost/limits -isystem bazel-out/ppc-opt/bin/external/boost/boost/limits -isystem external/boost/boost/type_traits -isystem bazel-out/ppc-opt/bin/external/boost/boost/type_traits -isystem external/boost/boost/mpl -isystem bazel-out/ppc-opt/bin/external/boost/boost/mpl -isystem external/boost/boost/move -isystem bazel-out/ppc-opt/bin/external/boost/boost/move -isystem external/boost/boost/detail -isystem bazel-out/ppc-opt/bin/external/boost/boost/detail -isystem external/boost/boost/preprocessor -isystem bazel-out/ppc-opt/bin/external/boost/boost/preprocessor -isystem external/boost/boost/io -isystem bazel-out/ppc-opt/bin/external/boost/boost/io -isystem external/boost/boost/iterator -isystem bazel-out/ppc-opt/bin/external/boost/boost/iterator -isystem external/boost/boost/utility -isystem bazel-out/ppc-opt/bin/external/boost/boost/utility -isystem external/boost/boost/swap -isystem bazel-out/ppc-opt/bin/external/boost/boost/swap -isystem external/boost/boost/range -isystem bazel-out/ppc-opt/bin/external/boost/boost/range -isystem external/boost/boost/array -isystem bazel-out/ppc-opt/bin/external/boost/boost/array -isystem external/boost/boost/throw_exception -isystem bazel-out/ppc-opt/bin/external/boost/boost/throw_exception -isystem external/boost/boost/current_function -isystem bazel-out/ppc-opt/bin/external/boost/boost/current_function -isystem external/boost/boost/exception -isystem bazel-out/ppc-opt/bin/external/boost/boost/exception -isystem external/boost/boost/concept_check -isystem bazel-out/ppc-opt/bin/external/boost/boost/concept_check -isystem external/boost/boost/concept -isystem bazel-out/ppc-opt/bin/external/boost/boost/concept -isystem external/boost/boost/concept_archetype -isystem bazel-out/ppc-opt/bin/external/boost/boost/concept_archetype -isystem external/boost/boost/noncopyable -isystem bazel-out/ppc-opt/bin/external/boost/boost/noncopyable -isystem external/boost/boost/optional -isystem bazel-out/ppc-opt/bin/external/boost/boost/optional -isystem external/boost/boost/none -isystem bazel-out/ppc-opt/bin/external/boost/boost/none -isystem external/boost/boost/type -isystem bazel-out/ppc-opt/bin/external/boost/boost/type -isystem external/boost/boost/ref -isystem bazel-out/ppc-opt/bin/external/boost/boost/ref -isystem external/boost/boost/regex -isystem bazel-out/ppc-opt/bin/external/boost/boost/regex -isystem external/boost/boost/cstdint -isystem bazel-out/ppc-opt/bin/external/boost/boost/cstdint -isystem external/boost/boost/predef -isystem bazel-out/ppc-opt/bin/external/boost/boost/predef -isystem external/boost/boost/smart_ptr -isystem bazel-out/ppc-opt/bin/external/boost/boost/smart_ptr -isystem external/boost/boost/align -isystem bazel-out/ppc-opt/bin/external/boost/boost/align -isystem external/boost/boost/scoped_array -isystem bazel-out/ppc-opt/bin/external/boost/boost/scoped_array -isystem external/boost/boost/checked_delete -isystem bazel-out/ppc-opt/bin/external/boost/boost/checked_delete -isystem external/boost/boost/scoped_ptr -isystem bazel-out/ppc-opt/bin/external/boost/boost/scoped_ptr -isystem external/boost/boost/shared_array -isystem bazel-out/ppc-opt/bin/external/boost/boost/shared_array -isystem external/boost/boost/shared_ptr -isystem bazel-out/ppc-opt/bin/external/boost/boost/shared_ptr -isystem external/boost/boost/tuple -isystem bazel-out/ppc-opt/bin/external/boost/boost/tuple -isystem external/boost/boost/system -isystem bazel-out/ppc-opt/bin/external/boost/boost/system -isystem external/boost/boost/cerrno -isystem bazel-out/ppc-opt/bin/external/boost/boost/cerrno -isystem external/com_github_google_flatbuffers/include -isystem bazel-out/ppc-opt/bin/external/com_github_google_flatbuffers/include -DARROW_USE_GLOG -Werror -w -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/plasma/cpp/src/plasma/client.cc -o bazel-out/ppc-opt/bin/external/plasma/_objs/plasma_client/client.pic.o)
Execution platform: @local_config_platform//:host

Use --sandbox_debug to see verbose messages from the sandbox
external/plasma/cpp/src/plasma/client.cc: In member function '__vector(4) __bool int plasma::PlasmaClient::Impl::IsInUse(const ObjectID&)':
external/plasma/cpp/src/plasma/client.cc:377:40: error: cannot convert 'bool' to '__vector(4) __bool int' in return
   return (elem != objects_in_use_.end());
                                        ^
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::Create(const ObjectID&, int64_t, const uint8_t*, int64_t, std::shared_ptr<arrow::Buffer>*, int)':
external/plasma/cpp/src/plasma/client.cc:476:49: error: no matching function for call to 'plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, bool)'
   IncrementObjectCount(object_id, &object, false);
                                                 ^
external/plasma/cpp/src/plasma/client.cc:476:49: note: candidate is:
external/plasma/cpp/src/plasma/client.cc:391:6: note: void plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, __vector(4) __bool int)
 void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id,
      ^
external/plasma/cpp/src/plasma/client.cc:391:6: note:   no known conversion for argument 3 from 'bool' to '__vector(4) __bool int'
external/plasma/cpp/src/plasma/client.cc:481:49: error: no matching function for call to 'plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, bool)'
   IncrementObjectCount(object_id, &object, false);
                                                 ^
external/plasma/cpp/src/plasma/client.cc:481:49: note: candidate is:
external/plasma/cpp/src/plasma/client.cc:391:6: note: void plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, __vector(4) __bool int)
 void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id,
      ^
external/plasma/cpp/src/plasma/client.cc:391:6: note:   no known conversion for argument 3 from 'bool' to '__vector(4) __bool int'
In file included from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/atomic_base.h:36:0,
                 from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/atomic:41,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/memory_pool.h:21,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/buffer.h:28,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:26,
                 from external/plasma/cpp/src/plasma/client.cc:20:
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::GetBuffers(const ObjectID*, int64_t, int64_t, const std::function<std::shared_ptr<arrow::Buffer>(const plasma::UniqueID&, const std::shared_ptr<arrow::Buffer>&)>&, plasma::ObjectBuffer*)':
external/plasma/cpp/src/plasma/client.cc:546:22: error: cannot convert 'bool' to '__vector(4) __bool int' in initialization
   bool all_present = true;
                      ^
external/plasma/cpp/src/plasma/client.cc:552:19: error: cannot convert 'bool' to '__vector(4) __bool int' in assignment
       all_present = false;
                   ^
external/plasma/cpp/src/plasma/client.cc:553:39: error: could not convert 'object_entry.std::__detail::_Node_iterator<_Value, __constant_iterators, __cache>::operator-><std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >, false, true>()->std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >::second.std::unique_ptr<_Tp, _Dp>::operator-><plasma::ObjectInUseEntry, std::default_delete<plasma::ObjectInUseEntry> >()->plasma::ObjectInUseEntry::is_sealed' from '__vector(4) __bool int' to 'bool'
     } else if (!object_entry->second->is_sealed) {
                                       ^
external/plasma/cpp/src/plasma/client.cc:553:39: error: in argument to unary !
external/plasma/cpp/src/plasma/client.cc:561:19: error: cannot convert 'bool' to '__vector(4) __bool int' in assignment
       all_present = false;
                   ^
external/plasma/cpp/src/plasma/client.cc:588:55: error: no matching function for call to 'plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*&, bool)'
       IncrementObjectCount(object_ids[i], object, true);
                                                       ^
external/plasma/cpp/src/plasma/client.cc:588:55: note: candidate is:
external/plasma/cpp/src/plasma/client.cc:391:6: note: void plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, __vector(4) __bool int)
 void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id,
      ^
external/plasma/cpp/src/plasma/client.cc:391:6: note:   no known conversion for argument 3 from 'bool' to '__vector(4) __bool int'
external/plasma/cpp/src/plasma/client.cc:592:18: error: could not convert 'all_present' from '__vector(4) __bool int' to 'bool'
   if (all_present) {
                  ^
external/plasma/cpp/src/plasma/client.cc:664:64: error: no matching function for call to 'plasma::PlasmaClient::Impl::IncrementObjectCount(__gnu_cxx::__alloc_traits<std::allocator<plasma::UniqueID> >::value_type&, plasma::PlasmaObject*&, bool)'
       IncrementObjectCount(received_object_ids[i], object, true);
                                                                ^
external/plasma/cpp/src/plasma/client.cc:664:64: note: candidate is:
external/plasma/cpp/src/plasma/client.cc:391:6: note: void plasma::PlasmaClient::Impl::IncrementObjectCount(const ObjectID&, plasma::PlasmaObject*, __vector(4) __bool int)
 void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id,
      ^
external/plasma/cpp/src/plasma/client.cc:391:6: note:   no known conversion for argument 3 from 'bool' to '__vector(4) __bool int'
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::Contains(const ObjectID&, __vector(4) __bool int*)':
external/plasma/cpp/src/plasma/client.cc:751:17: error: cannot convert 'int' to '__vector(4) __bool int' in assignment
     *has_object = 1;
                 ^
In file included from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/memory_pool.h:26:0,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/buffer.h:28,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:26,
                 from external/plasma/cpp/src/plasma/client.cc:20:
external/plasma/cpp/src/plasma/client.cc:761:80: error: cannot convert '__vector(4) __bool int*' to 'bool*' for argument '4' to 'arrow::Status plasma::ReadContainsReply(uint8_t*, size_t, plasma::ObjectID*, bool*)'
         ReadContainsReply(buffer.data(), buffer.size(), &object_id2, has_object));
                                                                                ^
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/status.h:58:28: note: in definition of macro 'ARROW_RETURN_NOT_OK'
     ::arrow::Status __s = (status);                            \
                            ^
external/plasma/cpp/src/plasma/client.cc:760:5: note: in expansion of macro 'RETURN_NOT_OK'
     RETURN_NOT_OK(
     ^
In file included from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/atomic_base.h:36:0,
                 from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/atomic:41,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/memory_pool.h:21,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/buffer.h:28,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:26,
                 from external/plasma/cpp/src/plasma/client.cc:20:
external/plasma/cpp/src/plasma/client.cc: In member function '__vector(4) __bool int plasma::PlasmaClient::Impl::ComputeObjectHashParallel(plasma::XXH64_state_t*, const unsigned char*, int64_t)':
external/plasma/cpp/src/plasma/client.cc:814:10: error: cannot convert 'bool' to '__vector(4) __bool int' in return
   return true;
          ^
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::Seal(const ObjectID&)':
external/plasma/cpp/src/plasma/client.cc:856:38: error: could not convert 'object_entry.std::__detail::_Node_iterator<_Value, __constant_iterators, __cache>::operator-><std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >, false, true>()->std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >::second.std::unique_ptr<_Tp, _Dp>::operator-><plasma::ObjectInUseEntry, std::default_delete<plasma::ObjectInUseEntry> >()->plasma::ObjectInUseEntry::is_sealed' from '__vector(4) __bool int' to 'bool'
   if (object_entry->second->is_sealed) {
                                      ^
external/plasma/cpp/src/plasma/client.cc:861:35: error: cannot convert 'bool' to '__vector(4) __bool int' in assignment
   object_entry->second->is_sealed = true;
                                   ^
In file included from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/util/compare.h:24:0,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/status.h:24,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/memory_pool.h:26,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/buffer.h:28,
                 from bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:26,
                 from external/plasma/cpp/src/plasma/client.cc:20:
external/plasma/cpp/src/plasma/client.cc: In member function 'arrow::Status plasma::PlasmaClient::Impl::Abort(const ObjectID&)':
external/plasma/cpp/src/plasma/client.cc:880:38: error: could not convert 'object_entry.std::__detail::_Node_iterator<_Value, __constant_iterators, __cache>::operator-><std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >, false, true>()->std::pair<const plasma::UniqueID, std::unique_ptr<plasma::ObjectInUseEntry> >::second.std::unique_ptr<_Tp, _Dp>::operator-><plasma::ObjectInUseEntry, std::default_delete<plasma::ObjectInUseEntry> >()->plasma::ObjectInUseEntry::is_sealed' from '__vector(4) __bool int' to 'bool'
   ARROW_CHECK(!object_entry->second->is_sealed)
                                      ^
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/util/macros.h:49:52: note: in definition of macro 'ARROW_PREDICT_TRUE'
 #define ARROW_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
                                                    ^
external/plasma/cpp/src/plasma/client.cc:880:3: note: in expansion of macro 'ARROW_CHECK'
   ARROW_CHECK(!object_entry->second->is_sealed)
   ^
external/plasma/cpp/src/plasma/client.cc:880:38: error: in argument to unary !
   ARROW_CHECK(!object_entry->second->is_sealed)
                                      ^
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/arrow/arrow/util/macros.h:49:52: note: in definition of macro 'ARROW_PREDICT_TRUE'
 #define ARROW_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
                                                    ^
external/plasma/cpp/src/plasma/client.cc:880:3: note: in expansion of macro 'ARROW_CHECK'
   ARROW_CHECK(!object_entry->second->is_sealed)
   ^
external/plasma/cpp/src/plasma/client.cc: At global scope:
external/plasma/cpp/src/plasma/client.cc:1163:8: error: prototype for 'arrow::Status plasma::PlasmaClient::Contains(const ObjectID&, __vector(4) __bool int*)' does not match any in class 'plasma::PlasmaClient'
 Status PlasmaClient::Contains(const ObjectID& object_id, bool* has_object) {
        ^
In file included from external/plasma/cpp/src/plasma/client.cc:20:0:
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:172:10: error: candidate is: arrow::Status plasma::PlasmaClient::Contains(const ObjectID&, bool*)
   Status Contains(const ObjectID& object_id, bool* has_object);
          ^
external/plasma/cpp/src/plasma/client.cc:1211:6: error: prototype for '__vector(4) __bool int plasma::PlasmaClient::IsInUse(const ObjectID&)' does not match any in class 'plasma::PlasmaClient'
 bool PlasmaClient::IsInUse(const ObjectID& object_id) {
      ^
In file included from external/plasma/cpp/src/plasma/client.cc:20:0:
bazel-out/ppc-opt/bin/external/plasma/_virtual_includes/plasma_client/plasma/client.h:296:8: error: candidate is: bool plasma::PlasmaClient::IsInUse(const ObjectID&)
   bool IsInUse(const ObjectID& object_id);
        ^
Target //:ray_pkg failed to build
INFO: Elapsed time: 921.482s, Critical Path: 44.12s
INFO: 1704 processes: 1704 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
  File "setup.py", line 205, in <module>
    license="Apache 2.0")
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/root/.local/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 202, in run
    self.run_command('build')
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 96, in run
    subprocess.check_call(command)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../build.sh', '-p', '/opt/rh/rh-python36/root/usr/bin/python3']' returned non-zero exit status 1.

I'm debugging this now.

amitsadaphule commented 4 years ago

I'm able to build ray-0.7.7 with bazel-1.1.0 now. Had to make some minor changes in the build script and bzl files in ray. I'm verifying my changes by triggering a fresh build in UBI 7.6 ppc64le. Once I'm done with that, I'll try building ray-0.8.0.

amitsadaphule commented 4 years ago

I'm able to build ray-0.7.7 as well as ray-0.8.1 in UBI 7.6 ppc64le container now. Had to changes to the build.sh file, bazel/ray_deps_setup.bzl file. And add a ppc specific patch in thirdparty/patches/. I've validated my changes on x86 too to make sure that my changes do not break that.

Thanks @xdever. Your build steps for arrow helped me to get rid of a major hurdle!

Thanks @felker for your inputs on bazel version.

felker commented 4 years ago

@amitsadaphule that is great news! I independently got as far as you did a few comments ago by modifying /home/kfelker/.cache/bazel/_bazel_kfelker/5f568a871ffef1dd98938c7174b4baa5/external/boost/BUILD.bazel

While I also ended up making the same edit:

BOOST_CTX_ASM_SOURCES = select({
    ":linux_arm": [
        "libs/context/src/asm/jump_arm_aapcs_elf_gas.S",
        "libs/context/src/asm/make_arm_aapcs_elf_gas.S",
        "libs/context/src/asm/ontop_arm_aapcs_elf_gas.S",
    ],
    ":linux_ppc64le": [
        "libs/context/src/asm/jump_ppc64_sysv_elf_gas.S",
        "libs/context/src/asm/make_ppc64_sysv_elf_gas.S",
        "libs/context/src/asm/ontop_ppc64_sysv_elf_gas.S",
    ],
    ":linux_x86_64": [
        "libs/context/src/asm/jump_x86_64_sysv_elf_gas.S",
        "libs/context/src/asm/make_x86_64_sysv_elf_gas.S",
        "libs/context/src/asm/ontop_x86_64_sysv_elf_gas.S",
    ],
    ":osx_x86_64": [
        "libs/context/src/asm/jump_x86_64_sysv_macho_gas.S",
        "libs/context/src/asm/make_x86_64_sysv_macho_gas.S",
        "libs/context/src/asm/ontop_x86_64_sysv_macho_gas.S",
    ],
    ":windows_x86_64": [
        "libs/context/src/asm/make_x86_64_ms_pe_masm.S",
        "libs/context/src/asm/jump_x86_64_ms_pe_masm.S",
        "libs/context/src/asm/ontop_x86_64_ms_pe_masm.S",
    ],
})

I also added linux_ppc64le entries to 4x more fields instead of your L1603 edit. This might be due to a different version of Boost that I am using.

Still, it is good to validate that we were both on the same track, and I was able to get to the same plasma build error. How did you fix that @amitsadaphule ? I am still stuck there.

I have installed Pyarrow (tried v0.15.1 and v0.14.1) and the Arrow C++ library from the IBM WMLCE Conda channel

conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda pyarrow arrow-cpp

instead of building the .whl from the source via @xdever's script. I tried converting the Conda installed package into a wheel and placing it within the Ray source code, but I could not figure out if it is possible.

felker commented 4 years ago

I know very little about Bazel , but it appears that our edits to Boost's BUILD.bazel might no longer be necessary with the latest versions: https://github.com/nelhage/rules_boost/pull/156 (specifically https://github.com/nelhage/rules_boost/commit/ebd1dd7d70d9a9c076a9d4e6d317c3236f303b66#diff-92e0042c94ad14988fb54ffb61047f5c)

amitsadaphule commented 4 years ago

@felker I was able to get past the plasma build error by adding "--cxxopt=-std=gnu++0x" option to the bazel build command. Also, about nelhage/rules_boost@ebd1dd7, I've used a patch with a subset of those changes during ray 0.7.7 and 0.8.1 build. The reason being ray is using an old commit from https://github.com/nelhage/rules_boost repo.

felker commented 4 years ago

Great! I will try that out. Can you share the patch?

I added a Travis CI ppc64le build to their repo in https://github.com/nelhage/rules_boost/pull/161 and this basic example could help for building a similar pipeline for this project.

felker commented 4 years ago

I got it working with Ray 0.8.0 with the following changes:

diff --git a/bazel/ray_deps_setup.bzl b/bazel/ray_deps_setup.bzl
index 4a5fb69c..0531af5c 100644
--- a/bazel/ray_deps_setup.bzl
+++ b/bazel/ray_deps_setup.bzl
@@ -121,9 +121,14 @@ def ray_deps_setup():
     github_repository(
         name = "com_github_nelhage_rules_boost",
         # If you update the Boost version, remember to update the 'boost' rule.
-        commit = "df908358c605a7d5b8bbacde07afbaede5ac12cf",
+       commit = "67ddc505bc5ad5a15562f07e4954ac2011177e13",
+       #commit = "df908358c605a7d5b8bbacde07afbaede5ac12cf",
         remote = "https://github.com/nelhage/rules_boost",
-        sha256 = "3775c5ab217e0c9cc380f56e243a4d75fe6fee8eaee1447899eaa04c5d582cf1",
+       #        sha256 = "3775c5ab217e0c9cc380f56e243a4d75fe6fee8eaee1447899eaa04c5d582cf1",
         patches = [
             "//thirdparty/patches:rules_boost-undefine-boost_fallthrough.patch",
         ],
diff --git a/build.sh b/build.sh
index 2a061bb3..08f8ae4d 100755
--- a/build.sh
+++ b/build.sh
@@ -123,7 +123,8 @@ if [ "$RAY_BUILD_JAVA" == "YES" ]; then
 fi

 if [ "$RAY_BUILD_PYTHON" == "YES" ]; then
-  "$BAZEL_EXECUTABLE" build //:ray_pkg --verbose_failures
+  "$BAZEL_EXECUTABLE" build //:ray_pkg --verbose_failures --cxxopt=-std=gnu++0x
 fi

 popd

where I specified the latest master revision of https://github.com/nelhage/rules_boost/commit/67ddc505bc5ad5a15562f07e4954ac2011177e13 (not sure how to update the sha256 field in ray_deps_setup.bzl).

After these changes (and the above mentioned steps to get Bazel and Pyarrow), python3 setup.py bdist_wheel successfully built a 14M Ray wheel /home/kfelker/ray_build/ray/python/dist/ray-0.8.0-cp36-cp36m-linux_ppc64le.whl.

Thanks for your help @amitsadaphule !

amitsadaphule commented 4 years ago

That's great @felker ! Good that it's building properly with master from https://github.com/nelhage/rules_boost/commit/67ddc505bc5ad5a15562f07e4954ac2011177e13 and no patch is needed to be applied to it from ray. I'll try that as well when I try to build ray master.

Also, could you check if you're able to execute the unit tests? For me, it is complaining about a couple of missing packages and most of those (grpc, tensorflow) are not readily available for ppc64le. Here's the log:

[root@1613428f0d4a python]# pytest ray/tests/
========================================================================= test session starts =========================================================================
platform linux -- Python 3.6.9, pytest-5.3.4, py-1.8.1, pluggy-0.13.1
rootdir: /root/ray_build/ray/python
collected 459 items / 4 errors / 455 selected

=============================================================================== ERRORS ================================================================================
______________________________________________________________ ERROR collecting ray/tests/test_cython.py ______________________________________________________________
ImportError while importing test module '/root/ray_build/ray/python/ray/tests/test_cython.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ray/tests/test_cython.py:9: in <module>
    import cython_examples as cyth
E   ModuleNotFoundError: No module named 'cython_examples'
________________________________________________________ ERROR collecting ray/tests/test_memory_scheduling.py _________________________________________________________
ImportError while importing test module '/root/ray_build/ray/python/ray/tests/test_memory_scheduling.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ray/tests/test_memory_scheduling.py:5: in <module>
    from ray import tune
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/__init__.py:2: in <module>
    from ray.tune.tune import run_experiments, run
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/tune.py:7: in <module>
    from ray.tune.analysis import ExperimentAnalysis
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/analysis/__init__.py:1: in <module>
    from ray.tune.analysis.experiment_analysis import ExperimentAnalysis, Analysis
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/analysis/experiment_analysis.py:14: in <module>
    from ray.tune.trial import Trial
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trial.py:13: in <module>
    from ray.tune.durable_trainable import DurableTrainable
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/durable_trainable.py:3: in <module>
    from ray.tune.trainable import Trainable, TrainableUtil
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trainable.py:9: in <module>
    import pandas as pd
E   ModuleNotFoundError: No module named 'pandas'
_____________________________________________________________ ERROR collecting ray/tests/test_metrics.py ______________________________________________________________
ImportError while importing test module '/root/ray_build/ray/python/ray/tests/test_metrics.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ray/tests/test_metrics.py:2: in <module>
    import grpc
E   ModuleNotFoundError: No module named 'grpc'
____________________________________________________________ ERROR collecting ray/tests/test_tensorflow.py ____________________________________________________________
ImportError while importing test module '/root/ray_build/ray/python/ray/tests/test_tensorflow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ray/tests/test_tensorflow.py:2: in <module>
    import tensorflow.compat.v1 as tf
E   ModuleNotFoundError: No module named 'tensorflow'
========================================================================== warnings summary ===========================================================================
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327
  /opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.asyncio - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    PytestUnknownMarkWarning,

/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327
  /opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.timeout - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    PytestUnknownMarkWarning,

ray/tests/test_object_manager.py:20
  /root/ray_build/ray/python/ray/tests/test_object_manager.py:20: UserWarning: This test must be run on large machines.
    warnings.warn("This test must be run on large machines.")

/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327
  /opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/mark/structures.py:327: PytestUnknownMarkWarning: Unknown pytest.mark.benchmark - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    PytestUnknownMarkWarning,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==================================================================== 5 warnings, 4 errors in 1.07s ====================================================================
felker commented 4 years ago

Both TensorFlow and gRPC are available on IBM's Conda channel, e.g.:

I also had to install the Cython examples from the Ray documentation folder, but then I was able to run the (failing) tests:

cd /home/kfelker/ray_build/ray/doc/examples/cython/
python setup.py install
cd ../../..
pytest python/ray/tests

The output I got was:

(frnn) (base) ➜  ray git:(0afe14f2) ✗ pytest python/ray/tests
============================================================================== test session starts ===============================================================================
platform linux -- Python 3.6.8, pytest-4.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /home/kfelker/ray_build/ray/python
plugins: hypothesis-3.59.1
collected 428 items

python/ray/tests/py3_test.py ....(pid=53609) *** Aborted at 1580508782 (unix time) try "date -d @1580508782" if you are using GNU date ***
(pid=53609) PC: @                0x0 (unknown)
(pid=53609) *** SIGSEGV (@0x7fffa92558c0) received by PID 53609 (TID 0x7fffa8bd4f20) from PID 18446744072252381376; stack trace: ***
(pid=53609)     @     0x7fffa8b704d8 ([vdso]+0x4d7)
(pid=53609)     @                0x0 (unknown)
(pid=53609)     @     0x7fffa140c4b0 _ZN5boost7context6detail11fiber_ontopINS0_5fiberEZNS_6fibers7context6resumeEPS5_EUlOS3_E_EENS1_10transfer_tES9_
(pid=53609)     @     0x7fffa140a5c8 boost::context::detail::fiber_entry<>()
(pid=53609)     @     0x7fffa1410090 make_fcontext
F(pid=55779) *** Aborted at 1580508803 (unix time) try "date -d @1580508803" if you are using GNU date ***
(pid=55779) PC: @                0x0 (unknown)
(pid=55779) *** SIGSEGV (@0x7fffb23c58c0) received by PID 55779 (TID 0x7fffb1d44f20) from PID 18446744072404883648; stack trace: ***
(pid=55779)     @     0x7fffb1ce04d8 ([vdso]+0x4d7)
(pid=55779)     @                0x0 (unknown)
(pid=55779)     @     0x7fffaa57c4b0 _ZN5boost7context6detail11fiber_ontopINS0_5fiberEZNS_6fibers7context6resumeEPS5_EUlOS3_E_EENS1_10transfer_tES9_
(pid=55779)     @     0x7fffaa57a5c8 boost::context::detail::fiber_entry<>()
(pid=55779)     @     0x7fffaa580090 make_fcontext
FF                                                                                                                                       [  1%]
python/ray/tests/test_actor.py .s............................sssssss....s.....                                                                                             [ 12%]
python/ray/tests/test_actor_failures.py .FFF.[1]    48213 abort      pytest python/ray/tests

Not sure how problematic this is; I have been using Ray successfully in some limited cases over the last day.

pcmoritz commented 4 years ago

@felker Sorry I just saw your message from above! On Python >= 3.6, Ray should be able to run without our custom version of pyarrow, and we are working towards removing that as a built in dependeny, so you shouldn't need to get that working on power pc :)

felker commented 4 years ago

@pcmoritz fortunately, I was able install pyarrow from IBM's Conda channel and use export SKIP_PYARROW_INSTALL=1 to bypass the custom pyarrow stuff.

Still, Travis CI ppc64le containers could be useful to build and deploy Ray wheels for that architecture.

pcmoritz commented 4 years ago

@felker Great to hear that! You should free to create a PR that adds the ppc64le build to the Ray matrix to build the wheels!

RobertCsordas commented 4 years ago

Thanks for everyone! I'm glad to see this great progress. However, @felker be aware that in order to use ray on a mixed architecture cluster, you have to have exactly the same version of pyarrow on all of them, including the ".RAY" at the end of the version number. So probably the IBM Conda version would work only in case every machine is PPC64. Alternatively, as a hack, one could remove on the head node the version checks in services.py, function check_version_info, and hope that it will work.

amitsadaphule commented 4 years ago

Thanks @felker for the quick reply on unit tests! I'll try it out.

amitsadaphule commented 4 years ago

@felker did you face a dependency issue with opencv-python while executing the tests? I couldn't find that through pip3 on UBI 7 ppc64le. I tried installing an rpm from http://mirror.centos.org/altarch/7/os/ppc64le/Packages/opencv-python-2.4.5-3.el7.ppc64le.rpm by resolving the necessary dependencies, but that didn't help either. Currently trying to build from source by following instructions from https://github.com/skvark/opencv-python.

felker commented 4 years ago

py-opencv and opencv are both available on WMLCE Conda channel for Python 3.6 and 3.7, e.g.: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le/py-opencv-3.4.7-py37_725.g92aa195.tar.bz2

amitsadaphule commented 4 years ago

Thanks @felker! I'm trying to avoid switching to conda. I'll try to resolve the build issues for opencv-python first. If that doesn't not work out, the packages from conda channels that you suggested will help me get this done anyway.

amitsadaphule commented 4 years ago

I was finally able to get opencv-python-headless and tensorflow 2.0.1 built and installed. I executed the tests on UBI 7.6 ppc64le and got the following result:

[root@c03c2e5fb315 ray]# pytest python/ray/tests/
============================================================================= test session starts ==============================================================================
platform linux -- Python 3.6.9, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /root/ray_build/ray/python
collected 480 items

python/ray/tests/py3_test.py ....FFFFE                                                                                                                                   [  1%]
python/ray/tests/test_actor.py EEEEEEEEF.......................sssss....s.......                                                                                         [ 12%]
python/ray/tests/test_actor_failures.py FF.FFFFFFF....s                                                                                                                  [ 15%]
python/ray/tests/test_actor_pool.py ......                                                                                                                               [ 16%]
python/ray/tests/test_actor_resources.py ...s..........                                                                                                                  [ 19%]
python/ray/tests/test_advanced.py ......s...s..                                                                                                                          [ 22%]
python/ray/tests/test_advanced_2.py ...............                                                                                                                      [ 25%]
python/ray/tests/test_advanced_3.py ...s..........s..........Fs                                                                                                          [ 30%]
python/ray/tests/test_array.py ...                                                                                                                                       [ 31%]
python/ray/tests/test_autoscaler.py ..................................                                                                                                   [ 38%]
python/ray/tests/test_autoscaler_yaml.py F                                                                                                                               [ 38%]
python/ray/tests/test_basic.py .............s.................................                                                                                           [ 48%]
python/ray/tests/test_component_failures.py F.F.                                                                                                                         [ 49%]
python/ray/tests/test_component_failures_2.py ....                                                                                                                       [ 50%]
python/ray/tests/test_component_failures_3.py ...                                                                                                                        [ 50%]
python/ray/tests/test_cython.py ...                                                                                                                                      [ 51%]
python/ray/tests/test_debug_tools.py F                                                                                                                                   [ 51%]
python/ray/tests/test_dynres.py ..............                                                                                                                           [ 54%]
python/ray/tests/test_failure.py .............ss...s............Terminated

There are quite a few test failures and the test execution gets terminated at 54%. Just to check parity, I built the code for 0.8.1 on UBI 7.6 x86 as well. There too I got similar result in terms of test failures and the test execution was killed at 54%. Here's the log:

[root@0ac385b5dd90 ray]# pytest python/ray/tests
============================================================================= test session starts ==============================================================================
platform linux -- Python 3.6.9, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /root/ray_build/ray/python
collected 480 items

python/ray/tests/py3_test.py ........E                                                                                                                                   [  1%]
python/ray/tests/test_actor.py EEEEEEEEF..................F....sssss....s.......                                                                                         [ 12%]
python/ray/tests/test_actor_failures.py FF.F.FFFFF....s                                                                                                                  [ 15%]
python/ray/tests/test_actor_pool.py ......                                                                                                                               [ 16%]
python/ray/tests/test_actor_resources.py ...s...F......                                                                                                                  [ 19%]
python/ray/tests/test_advanced.py ......s...s..                                                                                                                          [ 22%]
python/ray/tests/test_advanced_2.py ...............                                                                                                                      [ 25%]
python/ray/tests/test_advanced_3.py ...s..........s..........Fs                                                                                                          [ 30%]
python/ray/tests/test_array.py ...                                                                                                                                       [ 31%]
python/ray/tests/test_autoscaler.py ..................................                                                                                                   [ 38%]
python/ray/tests/test_autoscaler_yaml.py F                                                                                                                               [ 38%]
python/ray/tests/test_basic.py .............s.................................                                                                                           [ 48%]
python/ray/tests/test_component_failures.py FFFF                                                                                                                         [ 49%]
python/ray/tests/test_component_failures_2.py ....                                                                                                                       [ 50%]
python/ray/tests/test_component_failures_3.py ...                                                                                                                        [ 50%]
python/ray/tests/test_cython.py ...                                                                                                                                      [ 51%]
python/ray/tests/test_debug_tools.py F                                                                                                                                   [ 51%]
python/ray/tests/test_dynres.py ..............                                                                                                                           [ 54%]
python/ray/tests/test_failure.py .............ss...s............Killed

I need to investigate this further.

JasonWayne commented 4 years ago

I'm able to build ray-0.7.7 as well as ray-0.8.1 in UBI 7.6 ppc64le container now. Had to changes to the build.sh file, bazel/ray_deps_setup.bzl file. And add a ppc specific patch in thirdparty/patches/. I've validated my changes on x86 too to make sure that my changes do not break that.

@amitsadaphule Could you pls share your changes with me..?

abishekmuthian commented 4 years ago

Following up on the work of others in this thread, I was able to install ray on ARM64 (aarch64) and even attempt at mixed architecture (ARM64<->x86_64) distributed computing.

With the latest commits, PyArrow dependency has been removed even though pyarrow on ARM was not a problem to me as I've been using it regularly with other projects.

The issue with building Ray on ARM64 was with bazel rules for boost libraries, which is unfortunate as boost for ARM doesn't have any issues as such, but the build rules for ARM64 contains errors as detailed in https://github.com/ray-project/ray/issues/7184.

I've made a patch to address that and written the procedure to build and install ray on ARM64.

My patch may break dependency for ARMv7 as I've not figured out how to include files for both ARM32 and ARM64 under linux_arm bazel build rule. If anyone can advise me on how to achieve that, I will update the patch and submit a PR.

I've opened an issue on the same at nelhage/rules_boost from where the file is obtained during the ray build process.

Update: linux_aarch64 constraint added to the BUILD.boost can fix the above issues without removing compatibility with linux_arm. PR has been submitted to upstream here - https://github.com/nelhage/rules_boost/pull/168.

amitsadaphule commented 4 years ago

@JasonWayne please find the buildscript here. Please note that the script builds ray with python 3.7.3, since that was specific requirement in my case. You can use python 3.6 instead by installing rh-python36 and replace all occurrences of python3.7 with python3.6 and those of pip3.7 with pip3.6.

amitsadaphule commented 4 years ago

Following up on the test failures, post building ray 0.7.7 on RHEL 7.6 ppc64le and installing all test execution dependencies, when I tried to execute the test cases as pytest -v python/ray/tests/, I am experiencing occasional pytest freeze at random test cases ranging from 8% to 50% and test case execution termination at random tests ranging from 57-64%. Just to check parity, I tried the build and test execution on x86 RHEL 7.6 as well. I'm facing similar issues there too.

@felker did you get around those test case execution issues?

Has anyone else experienced this before? Is there some known solution to these problems?

felker commented 4 years ago

No, I have not tried to get the tests to pass since https://github.com/ray-project/ray/issues/4309#issuecomment-580940094

yunqu commented 4 years ago

I managed to build ray successfully for tag v8.3.0, with boost patches @heavyinfo provided. For some reasons the patches are still needed. I am able to build it for both armv7l and aarch64.

LukeIreland1 commented 4 years ago

https://github.com/ray-project/ray/issues/4548#issuecomment-624690466

AlessandroZavoli commented 4 years ago

@felker did you face a dependency issue with opencv-python while executing the tests? I couldn't find that through pip3 on UBI 7 ppc64le. I tried installing an rpm from http://mirror.centos.org/altarch/7/os/ppc64le/Packages/opencv-python-2.4.5-3.el7.ppc64le.rpm by resolving the necessary dependencies, but that didn't help either. Currently trying to build from source by following instructions from https://github.com/skvark/opencv-python.

I hade the same issue. Did you find a way to install it?

amitsadaphule commented 4 years ago

I hade the same issue. Did you find a way to install it?

Yes, hope this one helps: https://github.com/ppc64le/build-scripts/tree/master/pip-ray

AlessandroZavoli commented 4 years ago

I suppose i cannot run it without administrator privileges, that I don't have...

mattip commented 1 year ago

I changed the title to be about ppc64le. Is the goal to provide a ppc64le wheel, conda package, or to provide clear instructions how to build for ppc64le?

AlessandroZavoli commented 1 year ago

Yes, a conda package would be perfect

mattip commented 1 year ago

In order to provide ppc64le and aarch64 builds on conda, all the dependencies must be available. The rllib component depends on gym, which in turn depends on pygame (for Box2d) and ale-py (the Arcade Learning Environment). See the conda-forge PR for more information. Either the gym dependency should be made optional for rllib, or someone needs to put in the time to package those two libraries for gym so that gym 0.22+ can be built for conda-forge. Once that happens, migrating the packages to ppc64le and aarch64 should follow.