tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 613 forks source link

Unable to build from source with GPU support #2712

Open shkarupa-alex opened 2 years ago

shkarupa-alex commented 2 years ago

System information

Describe the bug

I've downloaded and built from source TF 2.9.1 with GPU support. No errors. But an error occurred during tensorflow_addons (0.17.0) building from source

DEBUG: /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/org_tensorflow/third_party/repo.bzl:124:14: 
Warning: skipping import of repository 'cub_archive' because it already exists.
DEBUG: /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:10: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1596824487 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
  /home/alex/jupyter/build/addons/WORKSPACE:45:14: in <toplevel>
  /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/org_tensorflow/tensorflow/workspace0.bzl:107:34: in workspace
  /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/bazel_toolchains/repositories/repositories.bzl:35:23: in repositories
Repository rule git_repository defined at:
  /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
WARNING: /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/local_config_tf/BUILD:13345:8: target 'libtensorflow_framework.so.2' is both a rule and a file; please choose another name for the rule
INFO: Analyzed target //:build_pip_pkg (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/alex/jupyter/build/addons/tensorflow_addons/custom_ops/image/BUILD:7:18: Compiling tensorflow_addons/custom_ops/image/cc/kernels/adjust_hsv_in_yiq_op_gpu.cu.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/ubuntu20.04-gcc9_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF ... (remaining 60 arguments skipped)
/dt9/usr/bin/gcc: No such file or directory
nvcc fatal   : Failed to preprocess host compiler properties.
Target //:build_pip_pkg failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.429s, Critical Path: 0.12s
INFO: 25 processes: 25 internal.
FAILED: Build did NOT complete successfully

Code to reproduce the issue

git clone https://github.com/tensorflow/addons.git
cd addons

export TF_NEED_CUDA="1"

python3 ./configure.py

bazel clean --expunge
bazel build build_pip_pkg
shkarupa-alex commented 2 years ago

But if I manually replace crosstool_top value in .bazelrc with "@local_config_cuda//crosstool:toolchain" - build continues... And another error occured:

ERROR: /home/alex/jupyter/build/addons/tensorflow_addons/custom_ops/seq2seq/BUILD:7:18: Compiling tensorflow_addons/custom_ops/seq2seq/cc/kernels/beam_search_ops_gpu.cu.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF ... (remaining 61 arguments skipped)
Traceback (most recent call last):
  File "external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 269, in <module>
    sys.exit(main())
  File "external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 256, in main
    return InvokeNvcc(leftover, log=args.cuda_log)
  File "external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 207, in InvokeNvcc
    nvccopts += r'-gencode=arch=compute_%s,\"code=sm_%s\" ' % (
TypeError: not all arguments converted during string formatting

It can be fixed by removing last ", capability" here https://github.com/tensorflow/addons/blob/master/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl#L208

And after that build will be successful.

bhack commented 2 years ago

Can you try to submit a PR?

bhack commented 2 years ago

@seanpmorgan Do we still needs these build_deps with the new 2.9 toolchain?

bermeitinger-b commented 2 years ago

This is not a proper PR, so I'll just put in a patch. I don't know if this has any side-effect. I guess it will fail on the official manylinux building process.

From 2f32601be926472f142bffbe820a28d05682219a Mon Sep 17 00:00:00 2001
From: Bernhard Bermeitinger <bernhard.bermeitinger@unisg.ch>
Date: Fri, 27 May 2022 10:52:13 +0200
Subject: [PATCH] fix compilation on cuda

Signed-off-by: Bernhard Bermeitinger <bernhard.bermeitinger@unisg.ch>
---
 .../crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl | 2 +-
 configure.py                                                    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl b/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
index affc0be..3b5fd82 100644
--- a/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
+++ b/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
@@ -205,7 +205,7 @@ def InvokeNvcc(argv, log=False):
       x.replace(".", "") for x in supported_cuda_compute_capabilities])
   for capability in supported_cuda_compute_capabilities[:-1]:
     nvccopts += r'-gencode=arch=compute_%s,\"code=sm_%s\" ' % (
-        capability, capability, capability)
+        capability, capability)
   if supported_cuda_compute_capabilities:
     capability = supported_cuda_compute_capabilities[-1]
     nvccopts += r'-gencode=arch=compute_%s,code=\"sm_%s,compute_%s\" ' % (
diff --git a/configure.py b/configure.py
index 0d65e88..24fd2d5 100644
--- a/configure.py
+++ b/configure.py
@@ -185,7 +185,7 @@ def configure_cuda():
     write("build --config=cuda")
     write("build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true")
     write(
-        "build:cuda --crosstool_top=@ubuntu20.04-gcc9_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda//crosstool:toolchain"
+        "build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain"
     )

-- 
2.36.1

Save it as fix_cuda.patch and apply it with patch -p1 -i fix_cuda.patch.

atuleu commented 2 years ago

Ugly fix on ubuntu 20.04

sudo mkdir -p /dt9/usr
sudo ln -s /usr/bin /dt9/usr/bin

Is @ubuntu20.04-gcc9_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda only intended to build from a docker image ?

bhack commented 2 years ago

Is @ubuntu20.04-gcc9_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda only intended to build from a docker image ?

It is mainly for producing manylinux2014 compatible wheels. But as we don't wan to maintain too much build configs we rely on this.

shkarupa-alex commented 2 years ago

@bhack , this issue is still not resolved. It still required to manually replace crosstool_top value in .bazelrc with "@local_config_cuda//crosstool:toolchain" I think it should be either set automatically when building outside docker or specified via args in "configure" command and documented in readme.

bhack commented 2 years ago

@shkarupa-alex It was closed automatically as connect to your PR by Github "magic" keywords..