ufoym / deepo

Setup and customize deep learning environment in seconds.
http://ufoym.com/deepo
MIT License
6.32k stars 750 forks source link

tensorflow 2.5.0 CUDA compatibility #145

Closed syncdoth closed 2 years ago

syncdoth commented 3 years ago

The latest versions use tensorflow==2.5.0, CUDA==10.2, cudnn7.

Apparently, tensorflow==2.5.0 seems to be not compatible with CUDA==10.2.

full log:

2021-06-22 06:10:01.725884: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-06-22 06:10:01.725969: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-06-22 06:10:01.725984: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-06-22 06:10:01.864938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-22 06:10:01.864983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-06-22 06:10:01.864993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-06-22 06:10:02.428699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:3e:00.0 name: Tesla V100-SXM2-32GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-06-22 06:10:02.428764: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-06-22 06:10:02.429027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-22 06:10:02.429043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]

GPU test:

import tensorflow as tf
print(tf.__version__)
print(tf.test.is_gpu_available())

>>> 2.5.0
>>> False

A simple workaround was downgrading to tensorflow-gpu==2.3.0, which was still OK in my project.

vanakema commented 2 years ago

I just wanted to raise that I had the same issue, and this solved it for me. Let's get the CUDA version updated!

ufoym commented 2 years ago

Sorry for any inconvenience. CUDA is now upgraded to 11.1 by default.

vanakema commented 2 years ago

Ah I think the issue I was having then is the “all-jupyter” tag isn’t pointing to the latest cu110 version. Lemme check docker hub

vanakema commented 2 years ago

Yeah the “all-jupyter” tag is still pointing to the same digest as “all-jupyter-cu101”

ufoym commented 2 years ago

@vanakema Sorry for that mistake! We will fix it ASAP.

ufoym commented 2 years ago

Fixed. Feel free to reopen this issue if problem still exist.