tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
182.91k stars 73.92k forks source link

Unable bazel build tensorflow with cuda support inside a docker #67145

Open PriyajeetGoswami opened 1 week ago

PriyajeetGoswami commented 1 week ago

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.7.0

Custom code

No

OS platform and distribution

Linux Ubuntu 20.04

Mobile device

No response

Python version

3.8.10

Bazel version

6.2.1

GCC/compiler version

9.3.0

CUDA/cuDNN version

11.2

GPU model and memory

No response

Current behavior?

https://www.tensorflow.org/install/source Followed the instructions here to create a tensorflow docker with gpu support, unable to build the tensorlfow using bazel using the following command bazel build //tensorflow/tools/pip_package:wheel --repo_env=WHEEL_NAME=tensorflow --config=cuda

Standalone code to reproduce the issue

I want to create a docker using which I build tflite model in C++ which can access the Nvidia gpu.

Relevant log output

root@b7bdee9743e8:/tensorflow/tensorflow# bazel build //tensorflow/tools/pip_package:wheel --repo_env=WHEEL_NAME=tensorflow --config=cuda
Starting local Bazel server and connecting to it...
WARNING: Option 'java_toolchain' is deprecated
WARNING: Option 'host_java_toolchain' is deprecated
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=191
INFO: Reading rc options for 'build' from /tensorflow/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /tensorflow/tensorflow/.bazelrc:
  'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /tensorflow/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /tensorflow/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:cuda in file /tensorflow/tensorflow/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:linux in file /tensorflow/tensorflow/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /tensorflow/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
WARNING: Option 'java_toolchain' is deprecated
WARNING: Option 'host_java_toolchain' is deprecated
INFO: Repository local_config_cuda instantiated at:
  /tensorflow/tensorflow/WORKSPACE:15:14: in <toplevel>
  /tensorflow/tensorflow/tensorflow/workspace2.bzl:1079:19: in workspace
  /tensorflow/tensorflow/tensorflow/workspace2.bzl:94:19: in _tf_toolchains
Repository rule cuda_configure defined at:
  /tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl:1448:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
   Traceback (most recent call last):
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 1401, column 38, in _cuda_autoconf_impl
                _create_local_cuda_repository(repository_ctx)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 978, column 35, in _create_local_cuda_repository
                cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 666, column 30, in _get_cuda_config
                config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn"])
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 643, column 41, in find_cuda_config
                exec_result = _exec_find_cuda_config(repository_ctx, script_path, cuda_libraries)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 637, column 19, in _exec_find_cuda_config
                return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
        File "/tensorflow/tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
                fail(
Error in fail: Repository command failed
Could not find any cublas_api.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr'
        '/usr/lib/x86_64-linux-gnu'
        '/usr/local/cuda'
        '/usr/local/cuda/lib64/stubs'
        '/usr/local/cuda/targets/x86_64-linux/lib'
ERROR: /tensorflow/tensorflow/WORKSPACE:15:14: fetching cuda_configure rule //external:local_config_cuda: Traceback (most recent call last):
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 1401, column 38, in _cuda_autoconf_impl
                _create_local_cuda_repository(repository_ctx)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 978, column 35, in _create_local_cuda_repository
                cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 666, column 30, in _get_cuda_config
                config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn"])
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 643, column 41, in find_cuda_config
                exec_result = _exec_find_cuda_config(repository_ctx, script_path, cuda_libraries)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 637, column 19, in _exec_find_cuda_config
                return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
        File "/tensorflow/tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
                fail(
Error in fail: Repository command failed
Could not find any cublas_api.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr'
        '/usr/lib/x86_64-linux-gnu'
        '/usr/local/cuda'
        '/usr/local/cuda/lib64/stubs'
        '/usr/local/cuda/targets/x86_64-linux/lib'
ERROR: @local_config_cuda//:enable_cuda :: Error loading option @local_config_cuda//:enable_cuda: Repository command failed
Could not find any cublas_api.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr'
        '/usr/lib/x86_64-linux-gnu'
        '/usr/local/cuda'
        '/usr/local/cuda/lib64/stubs'
        '/usr/local/cuda/targets/x86_64-linux/lib'
tilakrayal commented 1 week ago

@PriyajeetGoswami,

I was able to clone the tensorflow repository without any problem on Ubuntu. I observed that you are using Bazel 6.2, GCC 9.3 which is incompatible with TF v2.7.0. And also TF v2.7.0 is a pretty older version, please try to install the latest stable version.

Could you please create a virtual environment and try to install the tensorflow as mentioned in this official document link and have a look at the compatible tested build configurations as well. Please find the attached screenshot for reference.

Screenshot 2023-07-21 3 47 25 PM

Is there any specific reason to install tensorflow v2.7, because as mentioned above v2.7 is the pretty older version. It's unlikely for TF 2.7 version to receive any bug fixes except when we have security patches. There is a high possibility that this was fixed with later TF versions. Thank you!

PriyajeetGoswami commented 4 days ago

I was trying TF 2.7 because of some project requirement but moved on from that, used latest version of TF, it worked. Thanks

tilakrayal commented 3 days ago

@PriyajeetGoswami, Glad the issue is resolved. Please feel free to move this issue to closed status. Thank you!