Closed AlessandroFlati closed 2 years ago
Hi @AlessandroFlati ! As per GPU documentation , Minimum compute capability is 3.5. Thanks!
Hi @mohantym and thanks for the quick reply. As far as I understood, compute capability allowed is >= 3.5 for binaries, not for building from scratch. The same link I attached as the "very similar issue" talks about 3.0. Maybe you meant I have to drop the tensorflow version, or downgrade CUDA/Nvidia Drivers, but I don't expect all of a sudden that a perfectly working and CUDA/CUDNN capable GPU will work with NO version of Tensorflow.
Ok @AlessandroFlati ! Could you try again with CUDA 11.2 and cuDNN 8.1 and GCC 7.3.1 as this document suggests? Thanks!
Considerations:
CUDA error at particleSystem_cuda.cu:121 code=13(cudaErrorInvalidSymbol) "cudaMemcpyToSymbol(params, hostParams, sizeof(SimParams))"
or BlackScholes.cu(160) : getLastCudaError() CUDA error : BlackScholesGPU() execution failed : (209) no kernel image is available for execution on the device.
(before, those went all smooth).
I didn't even compile Tensorflow, being this the situation; I believe these CUDA/cuDNN versions are just too much for my Geforce GTX 770. Since 10.2 / 8.2.4.15 were working well, as your document suggests maybe I should go for tensorflow 2.3 with GCC 7.3.1 | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.1
.
I'll try this asap.Update: CUDA 10.1 is not compatible with GCC 7.5 or (in a boolean sense) with my graphics card. The only combination that worked was GCC 8 and CUDA 10.2. I hope it doesn't make so much difference with the above combination.
Just to be clear, I'm now testing
r2.3 | GCC 7.5 (for TF) / GCC 8.4 (for CUDA and cuDNN) | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2
which seems to be the closest to the tested builds.
Installation of CUDA and CUDNN went smoother than ever: cuDNN 7.6 seems to be even more fit to CUDA 10.2 + CC 3.0 than previous cuDNN 8.2, since it also passed half precision tests now.
Alas, the bazel build //tensorflow/tools/pip_package:build_pip_package
with GCC 7.5
fails. It reveals many info/warnings while building with Bazel 3.1.0, though: nothing that seems too important, since most come from -Wmaybe-uninitialized
, -Wunused-dunction
, -Wsign-compare
, -Wcomment
, -Wreturn-type
etc. and many other are just it was declared here
, but some other seem to be a little more worrying, like directory doesn't exist
for protobuf
and a lot of warning for external/eigen_archive/unsupported/Eigen/CXX11/
headers. Obviously, if you need it I can provide a complete log.
I'll now try to recompile with GCC 8, so that next combination is:
r2.3 | GCC 8.4 (for CUDA, cuDNN and TF) | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2
GCC 8.4
, while compiling perfectly CUDA and cuDNN samples, gives about the same errors in TF 2.3; here's the log
tensorflow_r2.3withgcc8.log
Trying GCC 9
r2.3 | GCC 8.4 (for CUDA, cuDNN) / GCC 9 (for TF) | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2
gets something predictable,
ERROR: /home/alessandro/.cache/bazel/_bazel_alessandro/66cb6378d0a5667806d8c4794375ceb9/external/nccl_archive/BUILD.bazel:53:1: C++ compilation of rule '@nccl_archive//:device_lib' failed (Exit 1)
In file included from /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cuda_runtime.h:83,
from <command-line>:
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
| ^~~~~
Target //tensorflow/tools/pip_package:build_pip_package failed to build
I can't compile with GCC 9 something that comes on top of GCC8-built CUDA.
So, my last resort is to rise the release version, that is the reason I tried latest r2.7
in the first place, with no success.
What's your advice?
Just to recap:
TF version | GCC version for TF | GCC version for CUDA and cuDNN | Bazel version | CUDA version | cuDNN version | CUDA Compiles | CUDA/cuDNN Samples work | TF Compiles |
---|---|---|---|---|---|---|---|---|
x | x | GCC 8.4 | x | x | CUDA 11.2 | Yes | Runtime errors on CUDA samples | x |
x | x | GCC 8.4 | x | x | CUDA 10.1 | No | x | x |
r2.3 | GCC 9.3 | GCC 8.4 | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2 | Yes | Yes | No |
r2.3 | GCC 8.4 | GCC 8.4 | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2 | Yes | Yes | No |
r2.3 | GCC 7.5 | GCC 8.4 | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2 | Yes | Yes | No |
r2.4 | GCC 8.4 | GCC 8.4 | Bazel 3.1.0 | cuDNN 7.6 | CUDA 10.2 | Yes | Yes | No |
r2.5 | GCC 8.4 | GCC 8.4 | Bazel 3.7.2 | cuDNN 8.2 | CUDA 10.2 | Yes | Yes | Bazel fails with no such package '@local_cuda//' |
r2.6 | GCC 8.4 | GCC 8.4 | Bazel 3.7.2 | cuDNN 7.6 | CUDA 10.2 | Yes | Yes | TF compiles but see below |
r2.7 | GCC 8.4 | GCC 8.4 | Bazel 3.7.2 | cuDNN 7.6 | CUDA 10.2 | Yes | Half precision samples of cuDNN fail | TF compiles, but gives runtime error |
Out of the pip install
, the sample code
import tensorflow as tf
gave
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
I then proceeded to pip install numpy --upgrade
, and the error vanishes.
The following minimal sample code
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
produces
TensorFlow version: 2.6.2
2021-11-16 10:45:04.371930: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-16 10:45:04.377871: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-16 10:45:04.378119: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-16 10:45:04.378283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1885] Ignoring visible gpu device (device: 0, name: GeForce GTX 770, pci bus id: 0000:27:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
Num GPUs Available: 0
Just to get rid of the non-fatal infos on NUMA, I used the famous
for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done
and, in fact, I'm now left with an (incomprehensible)
TensorFlow version: 2.6.2
Num GPUs Available: 0
2021-11-16 10:49:15.727029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1885] Ignoring visible gpu device (device: 0, name: GeForce GTX 770, pci bus id: 0000:27:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
I can't say for sure, but everything seems to point at the fact that the "right" CUDA/cuDNN/NVidia Drivers for my graphics card are 10.2/7.6/440.33.01. Given this, and the fact that its compute capabilities are 3.0, I can't understand why the r2.3
and r2.4
version won't compile and give all those warnings (I mean, knowing your team I know for sure that you wouldn't release something that out of the box has SO many warnings for a given compiler / settings).
Also, I would like to point out that r2.6
configure
script indicates 3.0 as the minimum compute compatibility, which is not aligned to what the runtime says in the above message.
I'm in your hands now.
Hi @Saduf2019! Could you please look at this issue!
Going to the root of the problem (tensorflow/core/common_runtime/gpu/gpu_device.cc:1885
), it seems that 3.0 gets not added to the cuda_supported_capabilities
.
This probably comes from the fact that the function
std::vector<se::CudaComputeCapability> GetSupportedCudaComputeCapabilities() {
std::vector<se::CudaComputeCapability> cuda_caps = {
ComputeCapabilityFromString("3.5"), ComputeCapabilityFromString("5.2")};
#ifdef TF_EXTRA_CUDA_CAPABILITIES
// TF_EXTRA_CUDA_CAPABILITIES should be defined a sequence separated by commas,
// for example:
// TF_EXTRA_CUDA_CAPABILITIES=3.0,4.0,5.0
// Use two-level macro expansion for stringification.
#define TF_XSTRING(...) #__VA_ARGS__
#define TF_STRING(s) TF_XSTRING(s)
string extra_cuda_caps = TF_STRING(TF_EXTRA_CUDA_CAPABILITIES);
#undef TF_STRING
#undef TF_XSTRING
auto extra_capabilities = str_util::Split(extra_cuda_caps, ',');
for (const auto& capability : extra_capabilities) {
cuda_caps.push_back(ComputeCapabilityFromString(capability));
}
#endif
return cuda_caps;
}
#endif // GOOGLE_CUDA
sets the minimum default as 3.5 (as opposed to configure, saying it is 3.0), and searches for other capabilities under the CMake(/environment I guess) variable TF_EXTRA_CUDA_CAPABILITIES
, and not TF_CUDA_COMPUTE_CAPABILITIES
as generated by ./configure
.
Then again, I explicitely added build:opt --copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0
, but probably it never reaches this part of the code (I guess why). I'll try and debug this part, but maybe it's just simpler to align the minimum default capability?
Adding by hand the extra compatibility to that piece of code
std::vector<se::CudaComputeCapability> cuda_caps = {
ComputeCapabilityFromString("3.0"), ComputeCapabilityFromString("3.5"), ComputeCapabilityFromString("5.2")};
caused no problem whatsoever in the compile, and a sample code of
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
predictions = model(x_train[:1]).numpy()
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
probability_model = tf.keras.Sequential([
model,
tf.keras.layers.Softmax()
])
probability_model(x_test[:5])
ran without an issue, with complete output
TensorFlow version: 2.6.2
Num GPUs Available: 1
2021-11-16 15:17:02.968210: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-16 15:17:03.330760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1183 MB memory: -> device: 0, name: GeForce GTX 770, pci bus id: 0000:27:00.0, compute capability: 3.0
2021-11-16 15:17:03.887366: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2942 - accuracy: 0.9148
Epoch 2/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1406 - accuracy: 0.9578
Epoch 3/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1056 - accuracy: 0.9687
Epoch 4/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0852 - accuracy: 0.9733
Epoch 5/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0728 - accuracy: 0.9782
313/313 - 0s - loss: 0.0709 - accuracy: 0.9790
Process finished with exit code 0
Now - how can I be sure that TF is actually using my GPU and that I just didn't deactivate a right alarm? Is there some kind of check on tensors like is_on_gpu()
or something similar?
Thank you all for your... support. I would suggest to fix both documentation and tensorflow/core/common_runtime/gpu/gpu_device.cc
adding CC 3.0
/CUDA 10.2
/CUDNN 7.6
to r2.6
(and, who knows, maybe also r2.7
) compatibility.
I can confirm, also r2.7.
I can confirm the issue exists, as I manually compiled Tensorflow 2.3.0 against CUDA 10.1 and cuDNN v7, and produces the same runtime ComputeCapability Error (3.0 < 3.5) mentioned above.
Reproduced in r2.3.4 and fixed using the method proposed by @AlessandroFlati. Hope this fix is going to be merged in upcoming versions.
Adding by hand the extra compatibility to that piece of code
std::vector<se::CudaComputeCapability> cuda_caps = { ComputeCapabilityFromString("3.0"), ComputeCapabilityFromString("3.5"), ComputeCapabilityFromString("5.2")};
Still work on r2.7.0. CUDA10.1/cudnn8.0.5 NV GT750M Big thanks.
Hi Could you please try with the latest version with the below compatible components and let us know the outcome. Thanks!
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.
System information
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No, the only lines I'm trying are
Describe the problem
As stated in documentation, after installing Bazel (3.7.2) I ran the
.configure
with the cuda option enabled. It went like this:After this, having found a very similar issue, I always add to
.tf_configure
:build:opt --copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0
(even if it seems present as a build--action_env
option), and the two lines to exclude XLA:build --define=with_xla_support=false
andbuild --action_env TF_ENABLE_XLA=0
.My
.tf_configure
looks like this:Then, I simply run
bazel build //tensorflow/tools/pip_package:build_pip_package
and this goes on without any error or even warnings.I finally ran
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
And even if with warnings, wheel gets built.
and
pip install /tmp/tensorflow_pkg/tensorflow-2.7.0-cp38-cp38-linux_x86_64.whl
goes without a flaw:Eventually, I run my python test:
And result is:
I'm switching to Tensorflow since pyTorch can't seem to handle compute capability 3.0 while Tensorflow can. But can it?