tbepler / topaz

Pipeline for particle picking in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Also featuring micrograph and tomogram denoising with DNNs.
GNU General Public License v3.0
170 stars 62 forks source link

CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx #154

Closed pconesa closed 1 year ago

pconesa commented 1 year ago

Hi, in some machines our installation (scipion-topaz) work fine but in our test server topaz is not finding the GPUs.

Topaz stderr output is:

CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.
# using device=0 with cuda=False
# Loading model: unet
# 1 of 10 completed.
# 2 of 10 completed.
# 3 of 10 completed.
# 4 of 10 completed.
# 5 of 10 completed.
# 6 of 10 completed.
# 7 of 10 completed.
# 8 of 10 completed.
# 9 of 10 completed.
# 10 of 10 completed.
CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.
CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.

nvisia-smi is:

buildbot@scipionbox:~$ nvidia-smi 
Fri Sep  9 16:47:05 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 27%   38C    P8     7W / 151W |      6MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:81:00.0 Off |                  N/A |
| 34%   35C    P8     6W / 151W |      6MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Quadro M4000        Off  | 00000000:82:00.0 Off |                  N/A |
| 46%   37C    P8    11W / 120W |     24MiB /  8125MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3783      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      3783      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      3783      G   /usr/lib/xorg/Xorg                 21MiB |
+-----------------------------------------------------------------------------+

environment info is

(topaz-0.2.5) buildbot@scipionbox:~$ conda list
# packages in environment at /home/buildbot/anaconda3/envs/topaz-0.2.5:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
blas                      1.0                         mkl  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2022.07.19           h06a4308_0  
certifi                   2021.5.30        py36h06a4308_0  
cudatoolkit               11.3.1               h2bc3f7f_2  
dataclasses               0.8                pyh4f3eec9_6  
ffmpeg                    4.3                  hf484d3e_0    pytorch
freetype                  2.11.0               h70c0345_0  
future                    0.18.2                   py36_1  
gmp                       6.2.1                h295c915_3  
gnutls                    3.6.15               he1e5248_0  
intel-openmp              2022.1.0          h9e868ea_3769  
joblib                    1.0.1              pyhd3eb1b0_0  
jpeg                      9e                   h7f8727e_0  
lame                      3.100                h7b6447c_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      3.0                  h295c915_0  
libdeflate                1.8                  h7f8727e_5  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libgfortran-ng            7.5.0               ha8ba4b0_17  
libgfortran4              7.5.0               ha8ba4b0_17  
libgomp                   11.2.0               h1234567_1  
libiconv                  1.16                 h7f8727e_2  
libidn2                   2.3.2                h7f8727e_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtasn1                  4.16.0               h27cfd23_0  
libtiff                   4.4.0                hecacb30_0  
libunistring              0.9.10               h27cfd23_0  
libuv                     1.40.0               h7b6447c_0  
libwebp-base              1.2.2                h7f8727e_0  
lz4-c                     1.9.3                h295c915_1  
mkl                       2020.2                      256  
mkl-service               2.3.0            py36he8ac12f_0  
mkl_fft                   1.3.0            py36h54f3939_0  
mkl_random                1.1.1            py36h0573a6f_0  
ncurses                   6.3                  h5eee18b_3  
nettle                    3.7.3                hbbd107a_1  
numpy                     1.19.2           py36h54aff64_0  
numpy-base                1.19.2           py36hfa32c7d_0  
olefile                   0.46                     py36_0  
openh264                  2.1.1                h4ff587b_0  
openjpeg                  2.4.0                h3ad879b_0  
openssl                   1.1.1q               h7f8727e_0  
pandas                    1.1.5            py36ha9443f7_0  
pillow                    8.3.1            py36h2c7a002_0  
pip                       21.2.2           py36h06a4308_0  
python                    3.6.13               h12debd9_1  
python-dateutil           2.8.2              pyhd3eb1b0_0  
pytorch                   1.10.2          py3.6_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2021.3             pyhd3eb1b0_0  
readline                  8.1.2                h7f8727e_1  
scikit-learn              0.24.2           py36ha9443f7_0  
scipy                     1.5.2            py36h0b6359f_0  
setuptools                58.0.4           py36h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.39.2               h5082296_0  
threadpoolctl             2.2.0              pyh0d69192_0  
tk                        8.6.12               h1ccaba5_0  
topaz                     0.2.5                      py_0    tbepler
torchvision               0.11.3               py36_cu113    pytorch
typing_extensions         4.1.1              pyh06a4308_0  
wheel                     0.37.1             pyhd3eb1b0_0  
xz                        5.2.5                h7f8727e_1  
zlib                      1.2.12               h5eee18b_3  
zstd                      1.5.2                ha4553b6_0 

I can see pytorch version relates to cuda 11.3 but we have cuda 11.4. Is this a problem?

This is our one line command we use to install topaz:

. /home/buildbot/anaconda3/etc/profile.d/conda.sh&&conda create -y -n topaz-0.2.5 python=3.6 &&conda activate topaz-0.2.5 &&conda install -y topaz=0.2.5 cudatoolkit -c tbepler -c pytorch

Should we be more specific in the versions of cudatoolkit or pytorch?

pconesa commented 1 year ago

Got more info, it seems tha although Nvidia smi shows cues 11.4... there is no cuda 11.4 installed,or at least in the regular /usr/local/cuda***

tbepler commented 1 year ago

@pconesa Did you figure out a solution to this? It sounds like an issue with your CUDA and/or pytorch installation rather than topaz itself.

pconesa commented 1 year ago

I actually do not know what we have done, but is fixed now. Thanks!