r2.7 not recognizing old GPU with compute capability 3.0 (GTX 770) at runtime, while everything seems fine during build and installation.

AlessandroFlati commented 2 years ago

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No, the only lines I'm trying are

import tensorflow as tf
print("TensorFlow version:", tf.__version__)

tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

OS Platform and Distribution: Linux Mint 20.02
TensorFlow installed from: Source
TensorFlow version: r2.7 (last official release branch up to date)
Python version: 3.8
Bazel version: 3.7.2
GCC/Compiler version: GCC 9.3.0
CUDA/cuDNN version: 10.2 / 8.2.4.15 (all samples for both work without a flaw, except for half precision samples obviously failing, cc being 3.0). NVidia Driver Version is 440.33.01
GPU model and memory: Geforce GTX 770 2GB

Describe the problem

As stated in documentation, after installing Bazel (3.7.2) I ran the .configure with the cuda option enabled. It went like this:

You have bazel 3.7.2 installed.
Please specify the location of python. [Default is /home/alessandro/tensorflow/bin/python3]: 

Found possible Python library paths:
  /home/alessandro/tensorflow/lib/python3.8/site-packages
Please input the desired Python library path to use.  Default is [/home/alessandro/tensorflow/lib/python3.8/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.

Found CUDA 10.2 in:
    /usr/local/cuda-10.2/targets/x86_64-linux/lib
    /usr/local/cuda-10.2/targets/x86_64-linux/include
Found cuDNN 8 in:
    /usr/local/cuda-10.2/targets/x86_64-linux/lib
    /usr/local/cuda-10.2/targets/x86_64-linux/include

Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.0

WARNING: XLA does not support CUDA compute capabilities lower than 3.5. Disable XLA when running on older GPUs.
Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: 

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

After this, having found a very similar issue, I always add to .tf_configure: build:opt --copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0 (even if it seems present as a build --action_env option), and the two lines to exclude XLA: build --define=with_xla_support=false and build --action_env TF_ENABLE_XLA=0.

My .tf_configure looks like this:

build --action_env PYTHON_BIN_PATH="/home/alessandro/tensorflow/bin/python3"
build --action_env PYTHON_LIB_PATH="/home/alessandro/tensorflow/lib/python3.8/site-packages"
build --python_path="/home/alessandro/tensorflow/bin/python3"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda-10.2"
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="3.0"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/x86_64-linux-gnu-gcc-9"
build --config=cuda

build --define=with_xla_support=false
build --action_env TF_ENABLE_XLA=0
build:opt --copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0

build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-Wno-sign-compare
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_env=LD_LIBRARY_PATH
test:v1 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial
test:v1 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu
test:v2 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial,-v1only
test:v2 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu,-v1only

Then, I simply run bazel build //tensorflow/tools/pip_package:build_pip_package and this goes on without any error or even warnings.

I finally ran ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

And even if with warnings, wheel gets built.

Mon 15 Nov 2021 11:01:36 AM CET : === Preparing sources in dir: /tmp/tmp.AZIEudvT8k
~/Downloads/tensorflow ~/Downloads/tensorflow
~/Downloads/tensorflow
~/Downloads/tensorflow/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow ~/Downloads/tensorflow
~/Downloads/tensorflow
/tmp/tmp.AZIEudvT8k/tensorflow/include ~/Downloads/tensorflow
~/Downloads/tensorflow
Mon 15 Nov 2021 11:01:52 AM CET : === Building wheel
warning: no files found matching 'README'
warning: no files found matching '*.pyd' under directory '*'
warning: no files found matching '*.pyi' under directory '*'
warning: no files found matching '*.pd' under directory '*'
warning: no files found matching '*.dylib' under directory '*'
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.csv' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*.proto' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
/home/alessandro/tensorflow/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
Mon 15 Nov 2021 11:02:19 AM CET : === Output wheel file is in: /tmp/tensorflow_pkg

and pip install /tmp/tensorflow_pkg/tensorflow-2.7.0-cp38-cp38-linux_x86_64.whl goes without a flaw:

Processing /tmp/tensorflow_pkg/tensorflow-2.7.0-cp38-cp38-linux_x86_64.whl
Requirement already satisfied: flatbuffers<3.0,>=1.12 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (2.0)
Requirement already satisfied: libclang>=9.0.1 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (12.0.0)
Requirement already satisfied: keras<2.8,>=2.7.0rc0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (2.7.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (1.41.1)
Requirement already satisfied: typing-extensions>=3.6.6 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (3.10.0.2)
Requirement already satisfied: h5py>=2.9.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (3.5.0)
Requirement already satisfied: keras-preprocessing>=1.1.1 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (1.1.2)
Requirement already satisfied: termcolor>=1.1.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (1.1.0)
Requirement already satisfied: wrapt>=1.11.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (1.13.3)
Requirement already satisfied: tensorflow-estimator<2.8,~=2.7.0rc0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (2.7.0)
Requirement already satisfied: google-pasta>=0.1.1 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (0.2.0)
Requirement already satisfied: protobuf>=3.9.2 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (3.19.1)
Requirement already satisfied: tensorboard~=2.6 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (2.7.0)
Requirement already satisfied: absl-py>=0.4.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (1.0.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (3.3.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.21.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (0.22.0)
Requirement already satisfied: numpy>=1.14.5 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (1.21.4)
Requirement already satisfied: gast<0.5.0,>=0.2.1 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (0.4.0)
Requirement already satisfied: astunparse>=1.6.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (1.6.3)
Requirement already satisfied: six>=1.12.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (1.16.0)
Requirement already satisfied: wheel<1.0,>=0.32.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorflow==2.7.0) (0.37.0)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow==2.7.0) (0.4.6)
Requirement already satisfied: google-auth<3,>=1.6.3 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow==2.7.0) (2.3.3)
Requirement already satisfied: setuptools>=41.0.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow==2.7.0) (58.3.0)
Requirement already satisfied: werkzeug>=0.11.15 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow==2.7.0) (2.0.2)
Requirement already satisfied: requests<3,>=2.21.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow==2.7.0) (2.26.0)
Requirement already satisfied: markdown>=2.6.8 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow==2.7.0) (3.3.4)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow==2.7.0) (1.8.0)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow==2.7.0) (0.6.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow==2.7.0) (4.7.2)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow==2.7.0) (4.2.4)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow==2.7.0) (0.2.8)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow==2.7.0) (1.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow==2.7.0) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow==2.7.0) (1.26.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow==2.7.0) (2.0.7)
Requirement already satisfied: idna<4,>=2.5 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow==2.7.0) (3.3)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow==2.7.0) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /home/alessandro/tensorflow/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow==2.7.0) (3.1.1)
Installing collected packages: tensorflow
Successfully installed tensorflow-2.7.0

Eventually, I run my python test:

import tensorflow as tf
print("TensorFlow version:", tf.__version__)

tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

And result is:

TensorFlow version: 2.7.0
Num GPUs Available:  0
2021-11-15 11:07:33.547632: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2021-11-15 11:07:33.547653: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: alessandro-MS-7B79
2021-11-15 11:07:33.547656: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: alessandro-MS-7B79
2021-11-15 11:07:33.547715: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 440.33.1
2021-11-15 11:07:33.547728: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 440.33.1
2021-11-15 11:07:33.547732: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 440.33.1

I'm switching to Tensorflow since pyTorch can't seem to handle compute capability 3.0 while Tensorflow can. But can it?

mohantym commented 2 years ago

Hi @AlessandroFlati ! As per GPU documentation , Minimum compute capability is 3.5. Thanks!

AlessandroFlati commented 2 years ago

Hi @mohantym and thanks for the quick reply. As far as I understood, compute capability allowed is >= 3.5 for binaries, not for building from scratch. The same link I attached as the "very similar issue" talks about 3.0. Maybe you meant I have to drop the tensorflow version, or downgrade CUDA/Nvidia Drivers, but I don't expect all of a sudden that a perfectly working and CUDA/CUDNN capable GPU will work with NO version of Tensorflow.

mohantym commented 2 years ago

Ok @AlessandroFlati ! Could you try again with CUDA 11.2 and cuDNN 8.1 and GCC 7.3.1 as this document suggests? Thanks!

AlessandroFlati commented 2 years ago

Considerations:

GCC 7.3.1 is nearly impossible to install in an Ubuntu system (it seems easy on RedHat with devtoolset-7, but I can't try rn), so I went for GCC 7.5.0.
After installing cuda and nvidia drivers, CUDA Samples compiled well, but generated runtime errors like CUDA error at particleSystem_cuda.cu:121 code=13(cudaErrorInvalidSymbol) "cudaMemcpyToSymbol(params, hostParams, sizeof(SimParams))" or BlackScholes.cu(160) : getLastCudaError() CUDA error : BlackScholesGPU() execution failed : (209) no kernel image is available for execution on the device. (before, those went all smooth). I didn't even compile Tensorflow, being this the situation; I believe these CUDA/cuDNN versions are just too much for my Geforce GTX 770. Since 10.2 / 8.2.4.15 were working well, as your document suggests maybe I should go for tensorflow 2.3 with GCC 7.3.1 | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.1. I'll try this asap.

AlessandroFlati commented 2 years ago

Update: CUDA 10.1 is not compatible with GCC 7.5 or (in a boolean sense) with my graphics card. The only combination that worked was GCC 8 and CUDA 10.2. I hope it doesn't make so much difference with the above combination. Just to be clear, I'm now testing r2.3 | GCC 7.5 (for TF) / GCC 8.4 (for CUDA and cuDNN) | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2 which seems to be the closest to the tested builds.

AlessandroFlati commented 2 years ago

Installation of CUDA and CUDNN went smoother than ever: cuDNN 7.6 seems to be even more fit to CUDA 10.2 + CC 3.0 than previous cuDNN 8.2, since it also passed half precision tests now.

Alas, the bazel build //tensorflow/tools/pip_package:build_pip_package with GCC 7.5 fails. It reveals many info/warnings while building with Bazel 3.1.0, though: nothing that seems too important, since most come from -Wmaybe-uninitialized, -Wunused-dunction, -Wsign-compare, -Wcomment, -Wreturn-type etc. and many other are just it was declared here, but some other seem to be a little more worrying, like directory doesn't exist for protobuf and a lot of warning for external/eigen_archive/unsupported/Eigen/CXX11/ headers. Obviously, if you need it I can provide a complete log.

I'll now try to recompile with GCC 8, so that next combination is: r2.3 | GCC 8.4 (for CUDA, cuDNN and TF) | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2

AlessandroFlati commented 2 years ago

GCC 8.4, while compiling perfectly CUDA and cuDNN samples, gives about the same errors in TF 2.3; here's the log tensorflow_r2.3withgcc8.log

Trying GCC 9

r2.3 | GCC 8.4 (for CUDA, cuDNN) / GCC 9 (for TF) | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2

gets something predictable,

ERROR: /home/alessandro/.cache/bazel/_bazel_alessandro/66cb6378d0a5667806d8c4794375ceb9/external/nccl_archive/BUILD.bazel:53:1: C++ compilation of rule '@nccl_archive//:device_lib' failed (Exit 1)
In file included from /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cuda_runtime.h:83,
                 from <command-line>:
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
  138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
      |  ^~~~~
Target //tensorflow/tools/pip_package:build_pip_package failed to build

I can't compile with GCC 9 something that comes on top of GCC8-built CUDA. So, my last resort is to rise the release version, that is the reason I tried latest r2.7 in the first place, with no success.

What's your advice?

AlessandroFlati commented 2 years ago

Just to recap:

TF version	GCC version for TF	GCC version for CUDA and cuDNN	Bazel version	CUDA version	cuDNN version	CUDA Compiles	CUDA/cuDNN Samples work	TF Compiles
x	x	GCC 8.4	x	x	CUDA 11.2	Yes	Runtime errors on CUDA samples	x
x	x	GCC 8.4	x	x	CUDA 10.1	No	x	x
r2.3	GCC 9.3	GCC 8.4	Bazel 3.1.0	cuDNN 7.6	CUDA 10.2	Yes	Yes	No
r2.3	GCC 8.4	GCC 8.4	Bazel 3.1.0	cuDNN 7.6	CUDA 10.2	Yes	Yes	No
r2.3	GCC 7.5	GCC 8.4	Bazel 3.1.0	cuDNN 7.6	CUDA 10.2	Yes	Yes	No
r2.4	GCC 8.4	GCC 8.4	Bazel 3.1.0	cuDNN 7.6	CUDA 10.2	Yes	Yes	No
r2.5	GCC 8.4	GCC 8.4	Bazel 3.7.2	cuDNN 8.2	CUDA 10.2	Yes	Yes	Bazel fails with `no such package '@local_cuda//'`
r2.6	GCC 8.4	GCC 8.4	Bazel 3.7.2	cuDNN 7.6	CUDA 10.2	Yes	Yes	*TF compiles* but see below**
r2.7	GCC 8.4	GCC 8.4	Bazel 3.7.2	cuDNN 7.6	CUDA 10.2	Yes	Half precision samples of cuDNN fail	*TF compiles, but gives runtime error*

r2.6 Output

Out of the pip install, the sample code

import tensorflow as tf

gave

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

I then proceeded to pip install numpy --upgrade, and the error vanishes. The following minimal sample code

import tensorflow as tf
print("TensorFlow version:", tf.__version__)
tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

produces

TensorFlow version: 2.6.2
2021-11-16 10:45:04.371930: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-16 10:45:04.377871: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-16 10:45:04.378119: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-16 10:45:04.378283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1885] Ignoring visible gpu device (device: 0, name: GeForce GTX 770, pci bus id: 0000:27:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
Num GPUs Available:  0

Just to get rid of the non-fatal infos on NUMA, I used the famous

for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done

and, in fact, I'm now left with an (incomprehensible)

TensorFlow version: 2.6.2
Num GPUs Available:  0
2021-11-16 10:49:15.727029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1885] Ignoring visible gpu device (device: 0, name: GeForce GTX 770, pci bus id: 0000:27:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.

I can't say for sure, but everything seems to point at the fact that the "right" CUDA/cuDNN/NVidia Drivers for my graphics card are 10.2/7.6/440.33.01. Given this, and the fact that its compute capabilities are 3.0, I can't understand why the r2.3 and r2.4 version won't compile and give all those warnings (I mean, knowing your team I know for sure that you wouldn't release something that out of the box has SO many warnings for a given compiler / settings).

Also, I would like to point out that r2.6 configure script indicates 3.0 as the minimum compute compatibility, which is not aligned to what the runtime says in the above message.

I'm in your hands now.

mohantym commented 2 years ago

Hi @Saduf2019! Could you please look at this issue!

AlessandroFlati commented 2 years ago

Going to the root of the problem (tensorflow/core/common_runtime/gpu/gpu_device.cc:1885), it seems that 3.0 gets not added to the cuda_supported_capabilities. This probably comes from the fact that the function

std::vector<se::CudaComputeCapability> GetSupportedCudaComputeCapabilities() {
  std::vector<se::CudaComputeCapability> cuda_caps = {
      ComputeCapabilityFromString("3.5"), ComputeCapabilityFromString("5.2")};
#ifdef TF_EXTRA_CUDA_CAPABILITIES
// TF_EXTRA_CUDA_CAPABILITIES should be defined a sequence separated by commas,
// for example:
//   TF_EXTRA_CUDA_CAPABILITIES=3.0,4.0,5.0
// Use two-level macro expansion for stringification.
#define TF_XSTRING(...) #__VA_ARGS__
#define TF_STRING(s) TF_XSTRING(s)
  string extra_cuda_caps = TF_STRING(TF_EXTRA_CUDA_CAPABILITIES);
#undef TF_STRING
#undef TF_XSTRING
  auto extra_capabilities = str_util::Split(extra_cuda_caps, ',');
  for (const auto& capability : extra_capabilities) {
    cuda_caps.push_back(ComputeCapabilityFromString(capability));
  }
#endif
  return cuda_caps;
}
#endif  // GOOGLE_CUDA

sets the minimum default as 3.5 (as opposed to configure, saying it is 3.0), and searches for other capabilities under the CMake(/environment I guess) variable TF_EXTRA_CUDA_CAPABILITIES, and not TF_CUDA_COMPUTE_CAPABILITIES as generated by ./configure. Then again, I explicitely added build:opt --copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0, but probably it never reaches this part of the code (I guess why). I'll try and debug this part, but maybe it's just simpler to align the minimum default capability?

AlessandroFlati commented 2 years ago

Adding by hand the extra compatibility to that piece of code

  std::vector<se::CudaComputeCapability> cuda_caps = {
      ComputeCapabilityFromString("3.0"), ComputeCapabilityFromString("3.5"), ComputeCapabilityFromString("5.2")};

caused no problem whatsoever in the compile, and a sample code of

import tensorflow as tf

print("TensorFlow version:", tf.__version__)

tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

predictions = model(x_train[:1]).numpy()
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test,  y_test, verbose=2)

probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

probability_model(x_test[:5])

ran without an issue, with complete output

TensorFlow version: 2.6.2
Num GPUs Available:  1
2021-11-16 15:17:02.968210: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-16 15:17:03.330760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1183 MB memory:  -> device: 0, name: GeForce GTX 770, pci bus id: 0000:27:00.0, compute capability: 3.0
2021-11-16 15:17:03.887366: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2942 - accuracy: 0.9148
Epoch 2/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1406 - accuracy: 0.9578
Epoch 3/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1056 - accuracy: 0.9687
Epoch 4/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0852 - accuracy: 0.9733
Epoch 5/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0728 - accuracy: 0.9782
313/313 - 0s - loss: 0.0709 - accuracy: 0.9790

Process finished with exit code 0

Now - how can I be sure that TF is actually using my GPU and that I just didn't deactivate a right alarm? Is there some kind of check on tensors like is_on_gpu() or something similar?

AlessandroFlati commented 2 years ago

Apparently yes

Thank you all for your... support. I would suggest to fix both documentation and tensorflow/core/common_runtime/gpu/gpu_device.cc adding CC 3.0/CUDA 10.2/CUDNN 7.6 to r2.6 (and, who knows, maybe also r2.7) compatibility.

AlessandroFlati commented 2 years ago

I can confirm, also r2.7.

s-kk commented 2 years ago

I can confirm the issue exists, as I manually compiled Tensorflow 2.3.0 against CUDA 10.1 and cuDNN v7, and produces the same runtime ComputeCapability Error (3.0 < 3.5) mentioned above.

s-kk commented 2 years ago

Reproduced in r2.3.4 and fixed using the method proposed by @AlessandroFlati. Hope this fix is going to be merged in upcoming versions.

alexhua commented 2 years ago

Adding by hand the extra compatibility to that piece of code

  std::vector<se::CudaComputeCapability> cuda_caps = {
      ComputeCapabilityFromString("3.0"), ComputeCapabilityFromString("3.5"), ComputeCapabilityFromString("5.2")};

Still work on r2.7.0. CUDA10.1/cudnn8.0.5 NV GT750M Big thanks.

sachinprasadhs commented 2 years ago

Hi Could you please try with the latest version with the below compatible components and let us know the outcome. Thanks!

GPU

Version | Python version | Compiler | Build tools | cuDNN | CUDA -- | -- | -- | -- | -- | -- tensorflow-2.10.0 | 3.7-3.10 | GCC 9.3.1 | Bazel 5.1.1 | 8.1 | 11.2

google-ml-butler[bot] commented 2 years ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 2 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

tensorflow / tensorflow