google / tensorstore

Library for reading and writing large multi-dimensional arrays.
https://google.github.io/tensorstore/
Other
1.35k stars 120 forks source link

Build fails on Linux with GCC 9, 10, and 12 #63

Closed wookayin closed 5 months ago

wookayin commented 1 year ago

https://google.github.io/tensorstore/installation.html says Build dependencies are GCC 9 or later, but it seems that tensorstore cannot be built using GCC 9 or 10. What is the minimum required version of GCC/G++?

Environment: Ubuntu 20.04 Focal LTS tensorstore: 0.1.28 (08-11-2022)

GCC 9.x: ./tensorstore/box.h:214:44: error: expected template-name before '<' token ``` Use --sandbox_debug to see verbose messages from the sandbox In file included from ./tensorstore/box.h:30, from ./tensorstore/internal/box_difference.h:20, from tensorstore/internal/box_difference.cc:15: ./tensorstore/internal/multi_vector.h: In substitution of 'template using MultiVectorStorage = tensorstore::internal::MultiVectorStorageImpl [with long int Ex tent = Rank; Ts = {long int, long int}]': ./tensorstore/box.h:137:67: required from here ./tensorstore/internal/multi_vector.h:105:58: error: taking address of rvalue [-fpermissive] 105 | MultiVectorStorageImpl; | ^ In file included from ./tensorstore/internal/multi_vector.h:29, from ./tensorstore/box.h:30, from ./tensorstore/internal/box_difference.h:20, from tensorstore/internal/box_difference.cc:15: ./tensorstore/rank.h:127:13: note: candidate: 'constexpr tensorstore::RankConstraint::operator tensorstore::DimensionIndex() const' 127 | constexpr operator DimensionIndex() const { return rank; } | ^~~~~~~~ ./tensorstore/rank.h:127:13: note: candidate expects 0 arguments, 1 provided In file included from ./tensorstore/internal/box_difference.h:20, from tensorstore/internal/box_difference.cc:15: ./tensorstore/box.h:214:44: error: expected template-name before '<' token 214 | class Box : public internal_box::BoxStorage { | ^ ./tensorstore/box.h:214:44: error: expected '{' before '<' token cc1plus: warning: unrecognized command line option '-Wno-unknown-warning-option' Target //python/tensorstore:_tensorstore__shared_objects failed to build ```
GCC 10.x: python_headers/object.h:136:30: error: lvalue required as left operand of assignment ``` ERROR: /tmp/pip-install-ltft0rab/tensorstore_0bc800fcb17142a7ba2120da8a360a3f/python/tensorstore/BUILD:381:20: Compiling python/tensorstore/bfloat16.cc failed: (Exit 1): gcc-10 failed: error executing command (cd $HOME/.cache/bazel/_bazel_$USER/4694eb6f528e602d9a898e0775c25c1f/sandbox/linux-sandbox/1851/execroot/com_google_tensorstore && \ exec env - \ PATH=/bin:/usr/bin:/usr/local/bin \ PWD=/proc/self/cwd \ (... omitted...) In file included from bazel-out/k8-opt/bin/external/local_config_python/_virtual_includes/python_headers/Python.h:44, from bazel-out/k8-opt/bin/external/com_github_pybind_pybind11/_virtual_includes/pybind11/pybind11/detail/../detail/common.h:208, from bazel-out/k8-opt/bin/external/com_github_pybind_pybind11/_virtual_includes/pybind11/pybind11/detail/../attr.h:13, from bazel-out/k8-opt/bin/external/com_github_pybind_pybind11/_virtual_includes/pybind11/pybind11/detail/class.h:12, from bazel-out/k8-opt/bin/external/com_github_pybind_pybind11/_virtual_includes/pybind11/pybind11/pybind11.h:13, from ./python/tensorstore/numpy.h:35, from python/tensorstore/bfloat16.cc:15: python/tensorstore/bfloat16.cc: In function 'bool tensorstore::internal_python::{anonymous}::Initialize()': bazel-out/k8-opt/bin/external/local_config_python/_virtual_includes/python_headers/object.h:136:30: error: lvalue required as left operand of assignment 136 | # define Py_TYPE(ob) Py_TYPE(_PyObject_CAST(ob)) | ~~~~~~~^~~~~~~~~~~~~~~~~~~~ python/tensorstore/bfloat16.cc:774:3: note: in expansion of macro 'Py_TYPE' 774 | Py_TYPE(&NPyBfloat16_Descr) = &PyArrayDescr_Type; | ^~~~~~~ At global scope: cc1plus: note: unrecognized command-line option '-Wno-unknown-warning-option' may have been intended to silence earlier diagnostics Target //python/tensorstore:_tensorstore__shared_objects failed to build ```
GCC 12.x: com_google_boringssl/src/crypto/refcount_c11.c:29:15: error: expected declaration specifiers or '...' ``` In file included from external/com_google_boringssl/src/crypto/refcount_c11.c:15: external/com_google_boringssl/src/crypto/internal.h: In function 'CRYPTO_load_word_be': external/com_google_boringssl/src/crypto/internal.h:923:3: warning: implicit declaration of function 'static_assert' [-Wimplicit-function-declaration] 923 | static_assert(sizeof(v) == 8, "crypto_word_t has unexpected size"); | ^~~~~~~~~~~~~ In file included from external/com_google_boringssl/src/crypto/internal.h:130: external/com_google_boringssl/src/crypto/refcount_c11.c: At top level: external/com_google_boringssl/src/crypto/refcount_c11.c:29:15: error: expected declaration specifiers or '...' before '_Alignof' 29 | static_assert(alignof(CRYPTO_refcount_t) == alignof(_Atomic CRYPTO_refcount_t), | ^~~~~~~ external/com_google_boringssl/src/crypto/refcount_c11.c:30:15: error: expected declaration specifiers or '...' before string constant 30 | "_Atomic alters the needed alignment of a reference count"); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ external/com_google_boringssl/src/crypto/refcount_c11.c:31:15: error: expected declaration specifiers or '...' before 'sizeof' 31 | static_assert(sizeof(CRYPTO_refcount_t) == sizeof(_Atomic CRYPTO_refcount_t), | ^~~~~~ external/com_google_boringssl/src/crypto/refcount_c11.c:32:15: error: expected declaration specifiers or '...' before string constant 32 | "_Atomic alters the size of a reference count"); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ external/com_google_boringssl/src/crypto/refcount_c11.c:34:15: error: expected declaration specifiers or '...' before '(' token 34 | static_assert((CRYPTO_refcount_t)-1 == CRYPTO_REFCOUNT_MAX, | ^ external/com_google_boringssl/src/crypto/refcount_c11.c:35:15: error: expected declaration specifiers or '...' before string constant 35 | "CRYPTO_REFCOUNT_MAX is incorrect"); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: note: unrecognized command-line option '-Wno-unknown-warning-option' may have been intended to silence earlier diagnostics Target //python/tensorstore:_tensorstore__shared_objects failed to build INFO: Elapsed time: 41.252s, Critical Path: 27.78s INFO: 1243 processes: 162 internal, 1081 linux-sandbox. FAILED: Build did NOT complete successfully ```

Command I ran in the above: pip install tensorstore

laramiel commented 1 year ago

I think that our current build machine uses gcc 12.2.0, as does my workspace (gcc (Debian 12.2.0-3) 12.2.0). Can you run the following in each build environment and tell me what it says:

./bazelisk.py build tensorstore/...
`./bazelisk.py info output_base`/external/local_config_cc/cc_wrapper.sh --version

These errors were on pip install tensorstore? What's your os, other environment like, because we should have prebuilt packages from github CI.

wookayin commented 1 year ago

The cc_wrapper.sh --version prints the same one as gcc --version on the $PATH.

(on the master branch)

GCC 10.3 (The system g++ installed on this Linux machine):

❯❯❯ which -a gcc g++
gcc: aliased to nocorrect gcc
/bin/gcc
/usr/bin/gcc
/bin/g++
/usr/bin/g++

❯❯❯ $(./bazelisk.py info output_base)/external/local_config_cc/cc_wrapper.sh --version
gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

GCC 12.2 (conda gcc/g++)

❯❯❯ which g++
$HOME/.miniforge3/bin/g++         # g++ (conda-forge gcc 12.2.0-19) 12.2.0

❯❯❯ $(./bazelisk.py info output_base)/external/local_config_cc/cc_wrapper.sh --version 

x86_64-conda-linux-gnu-cc (conda-forge gcc 12.2.0-19) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

With those bazel commands I somehow run into a different error:

ERROR: .../tensorstore/tensorstore/serialization/BUILD:64:20: Linking tensorstore/serialization/function_test failed: (Exit 1): x86_64-conda-linux-gnu-cc failed: error executing command $CONDA_PREFIX/bin/x86_64-conda-linux-gnu-cc @bazel-out/k8-fastbuild/bin/tensorstore/serialization/function_test-2.params

Use --sandbox_debug to see verbose messages from the sandbox
bazel-out/k8-fastbuild/bin/_solib_k8/libexternal_Scom_Ugoogle_Uabsl_Sabsl_Stime_Slibtime.so: error: undefined reference to 'clock_gettime'

This particular error has something to do with glibc 2.17+ not requiring -lrt flag.

One note is that the error messages in the OP was during installation via pip install . (which invokes bazel like python -u bazelisk.py build -c opt //python/tensorstore:_tensorstore__shared_objects --verbose_failures --copt=-fvisibility=hidden). In this case the same gcc, g++ on the $PATH was used.

jbms commented 1 year ago

The gcc 9 failures are known and expected --- we just need to change the documentation.

The other issues ideally can be solved. Which Python version?

wookayin commented 1 year ago

I see, documentation could be updated. Having GCC 12 as a standard, recommended version sounds good, but due to GCC's incompatibility with CUDA (https://stackoverflow.com/questions/6622454/cuda-incompatible-with-my-gcc-version) I hope that at least GCC 10.x could be supported. I think conda-shipped gcc 12.2 should work, but the linux system I'm particularly using has a bit old libstdc++ and glibc.

Python -- fails both on Python 3.10 and 3.11. I was actually trying to build tensorstore for python 3.11 because there is no prebuilt wheel available (it'd be also great if an official py311 support can be added!), but build failure is also there on python 3.10 as well with the same environment. In this issue, I more look to C++ build commands and/or documentations could be improved so bazel build can work with different environment configurations.

laramiel commented 1 year ago

The clock_getime failure is from Abseil; maybe we should raise a bug there. I don't see any -lrt conditional flags in the build, and I haven't looked into which gcc / glibc version require it.

I should try and setup an image to repro this. uname -a? You're running ubuntu and installed gcc how?

wookayin commented 1 year ago
Yes, my env is Ubuntu 20.04 LTS (focal), which ships with GCC 10. (Click to expand) ``` $ uname -a Linux HOSTNAME 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS" $ sudo apt install gcc-10 g++-10 # (Optional) $ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10 $ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 10 ```
sameeul commented 1 year ago

Just wanted to report that, with GCC 10.4 + and GCC 11, the build process (both python setup.py develop and building executables from the example fail due to failure to build boringssl. I can build cleanly with GCC 10.2 and GCC 10.3. I know this is not exactly a tensorstore issue rather an issue coming from one of its dependencies. I was just curious if any of you came across this except the author of issue (The GCC 12.X log).

jbms commented 1 year ago

@sameeul Are you also using gcc from conda?

It appears that it lacks proper C11 support, which is required by boringssl. For example, after installing conda-forge:

echo -e "#include <assert.h>\nstatic_assert(1);" > test.c
gcc -std=c11 -c test.cc

Fails with

t.c:2:15: error: expected declaration specifiers or '...' before numeric constant
    2 | static_assert(1);
      |               ^

In particular, if you look at mambaforge/x86_64-conda-linux-gnu/sysroot/usr/include/assert.h you will see that it is a very old version that lacks a #define static_assert line.

jbms commented 1 year ago

I filed upstream issue: https://github.com/conda-forge/linux-sysroot-feedstock/issues/44

sameeul commented 1 year ago

@jbms : yes, I was using conda gcc in cases where my build was failing. The cases where it was working were compilers that came directly from debian-bullseye repo. Thanks for looking into it.

alcarazolabs commented 11 months ago

Someone can help to fix this problem? Im triying to install tensorstore in a nvidia jetson orin nano arm64:

(.myenv) orin@ubuntu:~/Documents/tensorstore$ pip install tensorstore
Collecting tensorstore
  Using cached tensorstore-0.1.51.tar.gz (6.4 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>=1.16.0 in /home/orin/Documents/whisper/whisperjax/.myenv/lib/python3.9/site-packages (from tensorstore) (1.26.2)
Collecting ml-dtypes>=0.3.1 (from tensorstore)
  Using cached ml_dtypes-0.3.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (20 kB)
Using cached ml_dtypes-0.3.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (202 kB)
Building wheels for collected packages: tensorstore
  Building wheel for tensorstore (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for tensorstore (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [3245 lines of output]
      WARNING setuptools_scm._integration.setuptools pyproject.toml does not contain a tool.setuptools_scm section
      WARNING setuptools_scm.pyproject_reading toml section missing 'pyproject.toml does not contain a tool.setuptools_scm section'
      running bdist_wheel
      running build
      running build_py
      creating /tmp/tmpdqjypyim/lib.linux-aarch64-cpython-39
      creating /tmp/tmpdqjypyim/lib.linux-aarch64-cpython-39/tensorstore
      copying python/tensorstore/__init__.py -> /tmp/tmpdqjypyim/lib.linux-aarch64-cpython-39/tensorstore
      running build_ext
      /home/orin/Documents/whisper/whisperjax/.myenv/bin/python3.9 -u bazelisk.py build -c opt //python/tensorstore:_tensorstore__shared_objects --verbose_failures --action_env=PATH=/home/orin/Documents/whisper/whisperjax/.myenv/bin:/usr/local/cuda-11.4/bin:/home/orin/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin --copt=-fvisibility=hidden
      Starting local Bazel server and connecting to it...
      WARNING: ignoring LD_PRELOAD in environment.
      Loading:
      Loading:
      Loading:
      Loading: 0 packages loaded
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (1 packages loaded, 0 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (31 packages loaded, 9 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (34 packages loaded, 127 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (83 packages loaded, 290 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (131 packages loaded, 372 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (140 packages loaded, 1436 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (149 packages loaded, 2013 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (155 packages loaded, 2359 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (163 packages loaded, 3649 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (169 packages loaded, 4790 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (194 packages loaded, 6966 targets configured)
      Analyzing: target //python/tensorstore:_tensorstore__shared_objects (249 packages loaded, 7865 targets configured)
      INFO: Analyzed target //python/tensorstore:_tensorstore__shared_objects (250 packages loaded, 8109 targets configured).
       checking cached actions
......

      ./tensorstore/box.h:676:78: error: no matching function for call to 'tensorstore::RankConstraint::operator tensorstore::DimensionIndex(tensorstore::RankConstraint*)'
        676 | BoxView(const Box<Rank>& box) -> BoxView<RankConstraint::FromInlineRank(Rank)>;
            |                                                                              ^
      In file included from ./tensorstore/json_serialization_options_base.h:19,
                       from ./tensorstore/internal/json_binding/bindable.h:23,
                       from ./tensorstore/context_impl_base.h:37,
                       from ./tensorstore/context.h:29,
                       from ./tensorstore/kvstore/s3/s3_resource.h:25,
                       from tensorstore/kvstore/s3/s3_resource.cc:15:
      ./tensorstore/rank.h:126:13: note: candidate: 'constexpr tensorstore::RankConstraint::operator tensorstore::DimensionIndex() const'
        126 |   constexpr operator DimensionIndex() const { return rank; }
            |             ^~~~~~~~
      ./tensorstore/rank.h:126:13: note:   candidate expects 0 arguments, 1 provided
      cc1plus: warning: unrecognized command line option '-Wno-unknown-warning-option'
      Target //python/tensorstore:_tensorstore__shared_objects failed to build
      INFO: Elapsed time: 686.598s, Critical Path: 38.57s
      INFO: 2701 processes: 531 internal, 2166 linux-sandbox, 4 local.
      FAILED: Build did NOT complete successfully
      error: command '/home/orin/Documents/whisper/whisperjax/.myenv/bin/python3.9' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tensorstore
Failed to build tensorstore
ERROR: Could not build wheels for tensorstore, which is required to install pyproject.toml-based projects
(.myenv) orin@ubuntu:~/Documents/tensorstore$ 

Any suggestion I will appreciate it guys, thanks so much.

jbms commented 11 months ago

What is your compiler version?

alcarazolabs commented 11 months ago

What is your compiler version?

gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

Some minutes ago I updated the gcc to 10.5 now again appears the 9.4.0

I did:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test sudo apt-get update sudo apt-get install gcc-10

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10

jbms commented 11 months ago

It should work with GCC 10, but it will build using whatever is the default version of gcc and g++ in your path. You may need to do update-alternatives for both gcc and g++, and confirm their versions via gcc -v and g++ -v before trying pip install again.

alcarazolabs commented 11 months ago

It should work with GCC 10, but it will build using whatever is the default version of gcc and g++ in your path. You may need to do update-alternatives for both gcc and g++, and confirm their versions via gcc -v and g++ -v before trying pip install again.

Thanks for the reply. I should have the same version for gcc and g++?

Now I'm getting this error:

 # Configuration: 2e75659f83b112ea910d43e089378d23c32e14fbc15d5f986826762c5f40b8be
      # Execution platform: @local_config_platform//:host

      Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
      gcc: fatal error: cannot execute 'cc1plus': execvp: No such file or directory
      compilation terminated.
      Target //python/tensorstore:_tensorstore__shared_objects failed to build
      INFO: Elapsed time: 18.191s, Critical Path: 0.20s
      INFO: 440 processes: 436 internal, 1 linux-sandbox, 3 local.
      FAILED: Build did NOT complete successfully
      error: command '/home/orin/Documents/whisper/whisperjax/.myenv/bin/python3.9' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tensorstore

Now I updated the gcc version, now is permanent.:

gcc (Ubuntu 10.5.0-1ubuntu1~20.04) 10.5.0

However the g++ has this versión:

g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

Should I update the g++? thanks

jbms commented 11 months ago

Yes, you will need to upgrade g++ as well to the same version.

alcarazolabs commented 11 months ago

Yes, you will need to upgrade g++ as well to the same version.

Thanks so much, now I was able to install tensorstore.