Closed mrocklin closed 5 years ago
@mrocklin I assume this machine has an NVIDIA GPU in it? Could you run nvidia-smi
and dump the output here?
(dask-gdf) mrocklin@demouser-DGX-Station:~/dask_gdf$ nvidia-smi
Tue Sep 25 15:10:08 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145 Driver Version: 384.145 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |
| N/A 38C P0 37W / 300W | 29MiB / 16149MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |
| N/A 38C P0 36W / 300W | 10MiB / 16149MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-DGXS... On | 00000000:0E:00.0 Off | 0 |
| N/A 38C P0 51W / 300W | 15322MiB / 16149MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 |
| N/A 38C P0 36W / 300W | 10MiB / 16149MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1501 G /usr/lib/xorg/Xorg 18MiB |
| 2 13342 C .../sseibert/miniconda3/envs/tf/bin/python 15312MiB |
+-----------------------------------------------------------------------------+
@mrocklin and what version of cudatoolkit
got pulled? It looks like the driver version is too old for the CUDA version that was grabbed via conda.
I've tried both 9.2-0
and 9.1-h85f986d_0 numba
I would update your NVIDIA driver to 396. CUDA 9.1 requires driver 390.12 or newer. CUDA 9.2 requires driver 396.44 or newer.
Is that system wide or can that be handled in user space (I apologize for not having experience here). Is this something that can be handled by conda or is this deeper?
This is system wide as it needs to load kernel modules unfortunately. Otherwise I believe CUDA 9.0 should work with that driver and that's userspace.
Yeah, downgrading cudatoolkit to 9.0 works for me.
We are coordinating with conda to implement a detection for cuda driver version to know what range of cudatoolkit to install.
Closing as resolved and discussions related to this are ongoing in other issues.
I'm doing the following steps but having difficulty running tests. I suspect that my environment is slightly mis-configured
Install
Install dependencies into a new conda environment
Activate conda environment:
Clone dask_gdf repo:
Install from source:
Test output