tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.13k stars 74.27k forks source link

Can't build tensorflow with cuda_clang ubuntu20 #56324

Closed eduardinjo closed 2 years ago

eduardinjo commented 2 years ago
Click to expand! ### Issue Type Bug ### Source source ### Tensorflow Version tf2.10 ### Custom Code No ### OS Platform and Distribution Ubuntu 20.04 (from devel-gpu docker) ### Mobile device _No response_ ### Python version 3.8 ### Bazel version 5.1.1 ### GCC/Compiler version 9.4.0 ### CUDA/cuDNN version 11.2 ### GPU model and memory RTX2080Ti ### Current Behaviour? ```shell A bug happened!When using clang as compiler, setting its path as: build --action_env CLANG_CUDA_COMPILER_PATH="/usr/bin/clang-14" ``` and running command ``` bazel build --config=cuda --config=opt //tensorflow/tools/pip_package:build_pip_package ``` Got the fallowing error: ``` ERROR: /mnt/Documents/eduardss/tensorflow/tensorflow/lite/python/BUILD:68:10 Middleman _middlemen/tensorflow_Slite_Spython_Stflite_Uconvert-runfiles failed: undeclared inclusion(s) in rule '@com_google_absl//absl/base:log_severity': this rule is missing dependency declarations for the following files included by 'absl/base/log_severity.cc': '/usr/lib/clang/14.0.5/include/stddef.h' '/usr/lib/clang/14.0.5/include/__stddef_max_align_t.h' '/usr/lib/clang/14.0.5/include/stdarg.h' '/usr/lib/clang/14.0.5/include/stdint.h' '/usr/lib/clang/14.0.5/include/limits.h' INFO: Elapsed time: 8.262s, Critical Path: 2.47s INFO: 71 processes: 54 internal, 17 local. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully ``` This seems to be problem with installing clang from apt and has been reported before without any other solution as building clang from source. When build clang-12 from source it didn't have an error, but that version does not support sm75+ architectures. Downloading clang was also not an option as chromium versions are not built against `nvptx` target and produces error during build with sm75+. How to properly link clang13/14 as cuda compiler to build tensorflow? ``` ### Standalone code to reproduce the issue ```shell Building from source at master branch and in devel-gpu docker image. Using instructions from https://apt.llvm.org/ to get clang-14. Adding to tf_configure_bazel.rc: --action_env CLANG_CUDA_COMPILER_PATH="/usr/bin/clang-14" ``` ### Relevant log output ```shell WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior. INFO: Options provided by the client: Inherited 'common' options: --isatty=0 --terminal_columns=80 INFO: Reading rc options for 'build' from /mnt/Documents/eduardss/tensorflow/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'build' from /mnt/Documents/eduardss/tensorflow/.bazelrc: 'build' options: --define framework_shared_object=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=true INFO: Reading rc options for 'build' from /mnt/Documents/eduardss/tensorflow/.tf_configure.bazelrc: 'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/bin/python3 --config=tensorrt --action_env TF_CUDA_VERSION=11.2 --action_env TF_CUDNN_VERSION=8 --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-11.2 --action_env TF_CUDA_COMPUTE_CAPABILITIES=7.5,8.6 --action_env LD_LIBRARY_PATH=/usr/local/cuda-11.0/targets/x86_64-linux/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/include/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64/stubs:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.2/lib64 --config=cuda_clang --action_env CLANG_CUDA_COMPILER_PATH=/usr/bin/clang-14 INFO: Reading rc options for 'build' from /mnt/Documents/eduardss/tensorflow/.bazelrc: 'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils INFO: Found applicable config definition build:short_logs in file /mnt/Documents/eduardss/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file /mnt/Documents/eduardss/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:tensorrt in file /mnt/Documents/eduardss/tensorflow/.bazelrc: --repo_env TF_NEED_TENSORRT=1 INFO: Found applicable config definition build:cuda_clang in file /mnt/Documents/eduardss/tensorflow/.bazelrc: --config=cuda --repo_env TF_CUDA_CLANG=1 --@local_config_cuda//:cuda_compiler=clang INFO: Found applicable config definition build:cuda in file /mnt/Documents/eduardss/tensorflow/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda INFO: Found applicable config definition build:cuda in file /mnt/Documents/eduardss/tensorflow/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda INFO: Found applicable config definition build:opt in file /mnt/Documents/eduardss/tensorflow/.tf_configure.bazelrc: --copt=-Wno-sign-compare --host_copt=-Wno-sign-compare INFO: Found applicable config definition build:linux in file /mnt/Documents/eduardss/tensorflow/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes INFO: Found applicable config definition build:dynamic_kernels in file /mnt/Documents/eduardss/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/75569959891573459f33b5ff530388e15b55c5c8.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found Loading: Loading: 0 packages loaded Analyzing: target //tensorflow/tools/pip_package:build_pip_package (0 packages loaded, 0 targets configured) WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/XNNPACK/archive/da533e0114f2bf730f17853ae10556d84a3d1e89.zip failed: class java.io.FileNotFoundException GET returned 404 Not Found INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (0 packages loaded, 0 targets configured). INFO: Found 1 target... [1 / 1] checking cached actions [32 / 3,620] Compiling src/google/protobuf/compiler/main.cc; 0s local ... (8 actions, 7 running) [48 / 3,676] Compiling src/google/protobuf/compiler/main.cc; 2s local ... (24 actions running) ERROR: /root/.cache/bazel/_bazel_root/f5d1fdb6f5ab96afbd54f50def1c6c1a/external/com_google_absl/absl/base/BUILD.bazel:53:11: Compiling absl/base/log_severity.cc failed: undeclared inclusion(s) in rule '@com_google_absl//absl/base:log_severity': this rule is missing dependency declarations for the following files included by 'absl/base/log_severity.cc': '/usr/lib/clang/14.0.5/include/stddef.h' '/usr/lib/clang/14.0.5/include/__stddef_max_align_t.h' '/usr/lib/clang/14.0.5/include/stdarg.h' '/usr/lib/clang/14.0.5/include/stdint.h' '/usr/lib/clang/14.0.5/include/limits.h' Target //tensorflow/tools/pip_package:build_pip_package failed to build Use --verbose_failures to see the command lines of failed build steps. ERROR: /mnt/Documents/eduardss/tensorflow/tensorflow/lite/python/BUILD:68:10 Middleman _middlemen/tensorflow_Slite_Spython_Stflite_Uconvert-runfiles failed: undeclared inclusion(s) in rule '@com_google_absl//absl/base:log_severity': this rule is missing dependency declarations for the following files included by 'absl/base/log_severity.cc': '/usr/lib/clang/14.0.5/include/stddef.h' '/usr/lib/clang/14.0.5/include/__stddef_max_align_t.h' '/usr/lib/clang/14.0.5/include/stdarg.h' '/usr/lib/clang/14.0.5/include/stdint.h' '/usr/lib/clang/14.0.5/include/limits.h' INFO: Elapsed time: 8.262s, Critical Path: 2.47s INFO: 71 processes: 54 internal, 17 local. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully ```
sushreebarsa commented 2 years ago

@eduardinjo Could you try to use the latest stable TF version 2.9.0 and refer to the link to know more on the tested build configuration. Please let us know if it helps? Thank you!

google-ml-butler[bot] commented 2 years ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 2 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No