tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 610 forks source link

Build failure with Tensorflow addons 0.20 #2828

Open npanpaliya opened 1 year ago

npanpaliya commented 1 year ago

System information

Describe the bug While building TF addons 0.20 with TF 2.12, cuda 11.8 and cudnn 8.8.1, I'm seeing following build failure -

n file included from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/system/cuda/config.h:33,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/execution_policy.h:35,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/detail/iterator_facade_category.h:22,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/iterator_facade.h:37,
                 from external/cub_archive/cub/device/../iterator/arg_index_input_iterator.cuh:48,
                 from external/cub_archive/cub/device/device_reduce.cuh:41,
                 from tensorflow_addons/custom_ops/layers/cc/kernels/correlation_cost_op_gpu.cu.cc:20:
/usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/cub/util_namespace.cuh:46:2: error: #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
   46 | #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.

My .bazelrc looks like

build --action_env TF_HEADER_DIR="/opt/conda/envs/testaddons/lib/python3.10/site-packages/tensorflow/include"
build --action_env TF_SHARED_LIBRARY_DIR="/opt/conda/envs/testaddons/lib/python3.10/site-packages/tensorflow"
build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
build --action_env TF_CXX11_ABI_FLAG="1"
build --action_env TF_CPLUSPLUS_VER="c++17"
build --spawn_strategy=standalone
build --strategy=Genrule=standalone
build  --experimental_repo_remote_exec
build -c opt
build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1"
build --copt=-mavx
build --cxxopt=-std=c++17
build --host_cxxopt=-std=c++17
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda,/opt/conda/envs/testaddons,/usr/include"
build --action_env CUDNN_INSTALL_PATH="/opt/conda/envs/testaddons"
build --action_env TF_CUDA_VERSION="11"
build --action_env TF_CUDNN_VERSION="8.8"
test --config=cuda
build --config=cuda
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:cuda --crosstool_top=@ubuntu20.04-gcc9_manylinux2014-cuda11.8-cudnn8.6-tensorrt8.4_config_cuda//crosstool:toolchain
build --action_env PYTHON_BIN_PATH="/opt/conda/envs/testaddons/bin/python"
build --action_env PYTHON_LIB_PATH="/opt/conda/envs/testaddons/lib/python3.10/site-packages"
build --python_path="/opt/conda/envs/testaddons/bin/python"
build --action_env GCC_HOST_COMPILER_PATH="/opt/conda/envs/testaddons/bin/x86_64-conda-linux-gnu-cc"

Code to reproduce the issue Build command: bazel build -s --enable_runfiles build_pip_pkg

Please provide some help to get rid of this build error.

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

npanpaliya commented 1 year ago

@seanpmorgan - Could you please provide some pointer?

npanpaliya commented 1 year ago

Does anyone have any pointers to fix this issue?

bhack commented 1 year ago

it seems similar to https://github.com/dmlc/xgboost/issues/7378 fixed with https://github.com/dmlc/xgboost/pull/7379

npanpaliya commented 1 year ago

Okay.. Thanks @bhack. I'll give this a try.

MrAta commented 10 months ago

Running into the same issue when building tf addons 0.19 with cuda 11.8. what config should be used in this case? In my case removing cub from WORKSPACE similar to #2821 works. @seanpmorgan May I know what's the reason for cub removal in that PR?

854768750 commented 9 months ago

I have this issue in another project. Tried CUDA 10.1 and 12.3. Same issue. But there is no error with CUDA 11.4

fuhailin commented 9 months ago

Same issue with CUDA 10.8