opensciencegrid / osgvo-tensorflow-gpu

OSGVO's TensorFlow image, GPU flavor
3 stars 9 forks source link

Unable to see GPUs from singularity container #8

Open sam-may opened 5 years ago

sam-may commented 5 years ago

Hi,

I am trying to connect to the GPUs at the UCSD T2 center (on uaf-1), using this singularity container [1].

But, I find that nvidia-smi is unable to see the GPUs (gives the message "Failed to initialize NVML: Driver/library version mismatch"). Additionally, I'm unable to see the GPUs from tensorflow: Running [2] inside a python script gives [3].

Let me know if any other information would be helpful.

Thanks, Sam

[1] /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow-gpu:latest and connecting with singularity shell --bind /usr/lib64/nvidia:/host-libs /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow-gpu:latest

[2] from tensorflow.python.client import device_lib print(device_lib.list_local_devices())

[3] 2019-01-11 17:12:08.978401: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-01-11 17:12:08.993142: E tensorflow/stream_executor/cuda/cuda_driver.cc:397] failed call to cuInit: CUDA_ERROR_UNKNOWN 2019-01-11 17:12:08.993654: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: uaf-1.t2.ucsd.edu 2019-01-11 17:12:08.993836: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: uaf-1.t2.ucsd.edu 2019-01-11 17:12:08.994186: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 410.48.0 2019-01-11 17:12:08.995190: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 410.79.0 2019-01-11 17:12:08.995362: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:308] kernel version 410.79.0 does not match DSO version 410.48.0 -- cannot find working devices in this configuration [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 1535348500336384728 ]