tensorflow / custom-op

Guide for building custom op for TensorFlow
Apache License 2.0
378 stars 116 forks source link

CUDA version 10.1 is hardcoded #95

Closed Rocketknight1 closed 2 years ago

Rocketknight1 commented 3 years ago

I tried installing the tensorflow:2.4.0-custom-op-gpu-ubuntu16 image to compile an op for TF2.4, but I got the error below. It seems to be looking for a hardcoded CUDA 10.1, even though the TF2.4 in the image is compiled with CUDA 11.0.

Is there any workaround? I'm not even sure where to begin patching the code.

ERROR: An error occurred during the fetch of repository 'local_config_cuda':
   Traceback (most recent call last):
    File "/custom-op/gpu/cuda_configure.bzl", line 1254
        _create_local_cuda_repository(<1 more arguments>)
    File "/custom-op/gpu/cuda_configure.bzl", line 985, in _create_local_cuda_repository
        _get_cuda_config(repository_ctx)
    File "/custom-op/gpu/cuda_configure.bzl", line 714, in _get_cuda_config
        find_cuda_config(repository_ctx, <1 more arguments>)
    File "/custom-op/gpu/cuda_configure.bzl", line 694, in find_cuda_config
        auto_configure_fail(<1 more arguments>)
    File "/custom-op/gpu/cuda_configure.bzl", line 325, in auto_configure_fail
        fail(<1 more arguments>)

Cuda Configuration Error: Failed to run find_cuda_config.py: Could not find any cuda.h matching version '10.1' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr/local/cuda'
TirelessDev commented 3 years ago

Hi, I ran into this as well while trying to get TF3D to work with CUDA 11 and an RX 3090.

I managed to get it compiling by manually changing the .bazelrc file that gets generated by the configure.sh script.

For reference I changed mine to:

build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:manylinux2010cuda11 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010-nvcc-cuda11:toolchain
build --spawn_strategy=standalone
build --strategy=Genrule=standalone
build -c opt
build --action_env TF_HEADER_DIR="/usr/local/lib/python3.6/dist-packages/tensorflow/include"
build --action_env TF_SHARED_LIBRARY_DIR="/usr/local/lib/python3.6/dist-packages/tensorflow"
build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
build --action_env TF_NEED_CUDA="1"
build --action_env TF_CUDA_VERSION="11.0"
build --action_env TF_CUDNN_VERSION="8"
build --action_env CUDNN_INSTALL_PATH="/usr/lib/x86_64-linux-gnu"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda"
build --config=cuda
test --config=cuda
build --config=manylinux2010cuda11
test --config=manylinux2010cuda11

Note the change in toolchain in the second line to use cuda11, also the TF_CUDA_VERSION is set to 11.0 and the TF_CUDNN_VERSION to 8.

I haven't tested it thoroughly yet, but I managed to get a wheel made and it is correctly importing in python.