ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.98k stars 5.77k forks source link

[Core] In C++, there are D_GLIBCXX_USE_CXX11_ABI settings conflicts when both Ray and Arrow are used. #24566

Open infzo opened 2 years ago

infzo commented 2 years ago

What happened + What you expected to happen

I expected to use both Ray and Arrow in a c++ project, but a build conflict caused it to fail.

Problem Scene

In a C++ project, both Ray and Arrow are used to develop functions and features. Build code with CMake and link Ray's libray_api.so, Arrow's libarrow.so, and so on.

Problem Description

Cannot set D_GLIBCXX_USE_CXX11_ABI to both 0 and 1 in CMakeLists.txt.

The problem is caused by the C++ standard conflict. The field parameter in CMakeLists.txt is D_GLIBCXX_USE_CXX11_ABI.

Ray

When libray_api.so is used, D_GLIBCXX_USE_CXX11_ABI must be set to 0.

# CMakeList.txt
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)

If the D_GLIBCXX_USE_CXX11_ABI set to 1, the error information is as follows:

free(): invalid pointer
*** SIGABRT received at time=1651890091 on cpu 3 ***
PC: @     0x7fc3d04ea03b  (unknown)  raise
    @     0x7fc3d04ea0c0  1379971456  (unknown)
    @     0x7fc3d053c32c        320  (unknown)
    @           0x40597c       1952  main
    @     0x7fc3d04cb0b3  (unknown)  __libc_start_main
[2022-05-07 10:21:31,829 E 2888918 2888918] logging.cc:325: *** SIGABRT received at time=1651890091 on cpu 3 ***
[2022-05-07 10:21:31,829 E 2888918 2888918] logging.cc:325: PC: @     0x7fc3d04ea03b  (unknown)  raise
[2022-05-07 10:21:31,830 E 2888918 2888918] logging.cc:325:     @     0x7fc3d04ea0c0  1379971456  (unknown)
[2022-05-07 10:21:31,830 E 2888918 2888918] logging.cc:325:     @     0x7fc3d053c32c        320  (unknown)
[2022-05-07 10:21:31,830 E 2888918 2888918] logging.cc:325:     @           0x40597c       1952  main
[2022-05-07 10:21:31,830 E 2888918 2888918] logging.cc:325:     @     0x7fc3d04cb0b3  (unknown)  __libc_start_main
Aborted (core dumped)

Arrow

When libarrow*.so is used, D_GLIBCXX_USE_CXX11_ABI must be set to 1.

# CMakeList.txt
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)

If the D_GLIBCXX_USE_CXX11_ABI set to 0, the error information is as follows:

/usr/bin/ld: CMakeFiles/arrow_demo.dir/arrow_demo.cpp.o: in function `CreateTable()':
arrow_demo.cpp:(.text+0xc1): undefined reference to `arrow::field(std::string, std::shared_ptr<arrow::DataType>, bool, std::shared_ptr<arrow::KeyValueMetadata const>)'
/usr/bin/ld: arrow_demo.cpp:(.text+0x13a): undefined reference to `arrow::field(std::string, std::shared_ptr<arrow::DataType>, bool, std::shared_ptr<arrow::KeyValueMetadata const>)'
/usr/bin/ld: arrow_demo.cpp:(.text+0x1b3): undefined reference to `arrow::field(std::string, std::shared_ptr<arrow::DataType>, bool, std::shared_ptr<arrow::KeyValueMetadata const>)'
/usr/bin/ld: arrow_demo.cpp:(.text+0x22c): undefined reference to `arrow::field(std::string, std::shared_ptr<arrow::DataType>, bool, std::shared_ptr<arrow::KeyValueMetadata const>)'
/usr/bin/ld: CMakeFiles/arrow_demo.dir/arrow_demo.cpp.o: in function `arrow::Result<std::shared_ptr<arrow::Buffer> >::Result(arrow::Status const&)':
...

References

D_GLIBCXX_USE_CXX11_ABI : https://developers.redhat.com/blog/2015/02/05/gcc5-and-the-c11-abi

Versions / Dependencies

os         ubuntu 20.04.2 LTS

gcc        11.2.0
g++        11.2.0
cmake      3.22.3

ray        1.12.0
ray-cpp    1.12.0
pyarrow    5.0.0

Reproduction script

# CMakeLists.txt

# cmake verson
cmake_minimum_required(VERSION 3.22.3)

# D_GLIBCXX_USE_CXX11_ABI
# add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)

project(ray_and_arrow_demo)

link_directories(***/lib)
include_directories(***/include)

# ray_and_arrow_demo
add_executable(ray_and_arrow_demo ray_and_arrow_demo.cc)
add_library(ray_and_arrow_demo_code SHARED ray_and_arrow_demo.cc)

target_link_libraries(ray_and_arrow_demo -larrow -lray_api)
target_link_libraries(ray_and_arrow_demo_code -larrow -lray_api)

Issue Severity

High: It blocks me from completing my task.

scv119 commented 2 years ago

cc @qicosmos @mwtian Do you know why we can't build ray with D_GLIBCXX_USE_CXX11_ABI=1?

duburcqa commented 2 years ago

I don't know if it is related but manylinux2014 image can't build with D_GLIBCXX_USE_CXX11_ABI=1 because the provided GCC version is too old.

SongGuyang commented 2 years ago

@scv119 I think it is related to the compiling environment of ray wheels, but I don't know the real reason. A walking around way is that users can build the ray wheels by themself from source code, but we should find a best way to support this.

duburcqa commented 2 years ago

AFAIK, manylinux2014 is precisely the compiling environment of ray wheels on linux. manylinux_2_28 is currently under testing phase. Migrating to this new environment would fix the issue, but it means leaving people on Ubuntu 18 without wheels since it supports at most manylinux_2_27 (unfortunately there is no plan to release this image). I'm afraid going for a custom build is the only way.

SongGuyang commented 2 years ago

@duburcqa I run a container using the image quay.io/pypa/manylinux2014_x86_64:2021-11-07-28723f3 and get the gcc version:

# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-10/root/usr --mandir=/opt/rh/devtoolset-10/root/usr/share/man --infodir=/opt/rh/devtoolset-10/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --with-default-libstdcxx-abi=gcc4-compatible --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-10.2.1-20210130/obj-x86_64-redhat-linux/isl-install --disable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.1 20210130 (Red Hat 10.2.1-11) (GCC)

The gcc is located in:

# which gcc
/opt/rh/devtoolset-10/root/usr/bin/gcc

Seems the gcc version is not old?

SongGuyang commented 2 years ago

Does tensorflow also don't support ABI=1?

https://www.tensorflow.org/guide/create_op#compile_the_op_using_your_system_compiler_tensorflow_binary_installation

Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi.

duburcqa commented 2 years ago

Ok my bad, GCC is only one of the criteria. Here it is not because gcc is too old but because the environment relies on a old version of the standard library that does not support it anyway or something like that (more information can be found here). So this flag would be ignored. The problem is reversed for manylinux_2_24. The environment is more recent but the gcc version is much older (4.7). So it does not support it either...

infzo commented 2 years ago

Thanks for your attention, I expected to use both Ray and Arrow in a c++ project, but that seemed difficult for me :sob:. And now there are three questions:

  1. If I want to build a wheel of Ray with D_GLIBCXX_USE_CXX11_ABI=1, how do I modify Ray's code and build configuration? How to build and compile a wheel? Can there be some step-by-step help?
  2. Or can the community provide a wheel of Ray with API 1? As a temporary release.
  3. Or is there another way to make Ray and Arrow compatible?
infzo commented 2 years ago

Thanks for your attention, I expected to use both Ray and Arrow in a c++ project, but that seemed difficult for me 😭. And now there are three questions:

  1. If I want to build a wheel of Ray with D_GLIBCXX_USE_CXX11_ABI=1, how do I modify Ray's code and build configuration? How to build and compile a wheel? Can there be some step-by-step help?
  2. Or can the community provide a wheel of Ray with API 1? As a temporary release.
  3. Or is there another way to make Ray and Arrow compatible?

For the first question, adding build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1" # line 12 to .../ray/.bazelrc seems feasible. The build command is as follows:

# default
python setup.py bdist_wheel

# cpp
export RAY_INSTALL_CPP=1
python setup.py bdist_wheel
mwtian commented 2 years ago

@infzo did building your own wheels work? You may have found it already but here is the instruction on installing dependencies and building Ray from source: https://docs.ray.io/en/latest/ray-contribute/development.html#building-ray-on-linux-macos-full

infzo commented 2 years ago

@mwtian Thank you. By the above method, it works. But I ran into a new issue where ArrowTable does not support msgpack serialization. Is there an elegant way to pass the ArrowTable in the Task and Actor of Ray?

SongGuyang commented 2 years ago

@infzo Hi, we only support msgpack serialization now. I'm sorry that we don't have a good doc which indicate the details of serialization. Can you create a separate issue about this and show your code and error message? We'd like to make some enhancements of serialization in C++.

infzo commented 2 years ago

@infzo Hi, we only support msgpack serialization now. I'm sorry that we don't have a good doc which indicate the details of serialization. Can you create a separate issue about this and show your code and error message? We'd like to make some enhancements of serialization in C++.

ok, the new issue I created is : https://github.com/ray-project/ray/issues/24643