Closed yarikoptic closed 4 years ago
export CUDA_VISIBLE_DEVICES=0,1
export CUDA_VISIBLE_DEVICES=1
cool! my google foo failed me to find such answer!
I will need to make sure to pass them into santized singularity environment. kwyk
in turn could become smarter and choose a GPU with max available RAM, or least of processes running, or smth like that (when multiple are available)
Unfortunately seems have no effect on kwyk -- that one still selects GPU0 whenever I export CUDA_VISIBLE_DEVICES=1. Here is a full protocol with me entering singularity env, showing that env variable is set, trying to run kwyk, exiting singularity env, and running nvidia-smi (since there is no nvidia-smi in a container):
beast:~/datalad/labs/haxby/raiders$ SINGULARITYENV_CUDA_VISIBLE_DEVICES=1 singularity exec -e -B /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.74 -B /usr/lib/x86_64-linux-gnu/libcuda.so.1 ~/containers/images/neuronets/neuronets-kwyk--version-0.4-gpu.sing bash
beast:~/datalad/labs/haxby/raiders$ echo $CUDA_VISIBLE_DEVICES
1
beast:~/datalad/labs/haxby/raiders$ kwyk sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz out
Bayesian dropout functions have been loaded.
Your version: v0.4 Latest version: 0.4
++ Conforming volume to 1mm^3 voxels and size 256x256x256.
/opt/kwyk/freesurfer/bin/mri_convert: line 2: /opt/kwyk/freesurfer/sources.sh: No such file or directory
mri_convert.bin --conform sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz /tmp/tmp5jlabqhd.nii.gz
$Id: mri_convert.c,v 1.226 2016/02/26 16:15:24 mreuter Exp $
reading from sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz...
TR=10.00, TE=0.00, TI=0.00, flip angle=0.00
i_ras = (0, -1, 0)
j_ras = (0, 0, 1)
k_ras = (1, 0, 0)
changing data type from float to uchar (noscale = 0)...
MRIchangeType: Building histogram
Reslicing using trilinear interpolation
writing to /tmp/tmp5jlabqhd.nii.gz...
++ Running forward pass of model.
2019-12-04 16:24:50.009365: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-12-04 16:24:50.144303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GT 1030 major: 6 minor: 1 memoryClockRate(GHz): 1.468
pciBusID: 0000:af:00.0
totalMemory: 1.95GiB freeMemory: 1.56GiB
2019-12-04 16:24:50.144345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-12-04 16:24:50.780547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-04 16:24:50.780579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-12-04 16:24:50.780588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-12-04 16:24:50.780760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1374 MB memory) -> physical GPU (device: 0, name: GeForce GT 1030, pci bus id: 0000:af:00.0, compute capability: 6.1)
...
beast:~/datalad/labs/haxby/raiders$ exit
beast:~/datalad/labs/haxby/raiders$ nvidia-smi Wed Dec 4 11:25:12 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.74 Driver Version: 418.74 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 1030 On | 00000000:AF:00.0 On | N/A |
| 45% 46C P0 N/A / 30W | 368MiB / 1998MiB | 33% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... On | 00000000:D8:00.0 Off | N/A |
| 30% 38C P8 1W / 250W | 1MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
...
ok, the actual reason/mystery is that inside the container (or just by kwyk) the 'GeForce RTX 2080 Ti' is the device 0 not 1, so it worked:
beast:~/datalad/labs/haxby/raiders$ SINGULARITYENV_CUDA_VISIBLE_DEVICES=0 singularity exec -e -B /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.74 -B /usr/lib/x86_64-linux-gnu/libcuda.so.1 ~/containers/images/neuronets/neuronets-kwyk--version-0.4-gpu.sing kwyk sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz out
Bayesian dropout functions have been loaded.
Your version: v0.4 Latest version: 0.4
...
totalMemory: 10.73GiB freeMemory: 10.57GiB
2019-12-04 16:33:29.298895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-12-04 16:33:29.953479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-04 16:33:29.953512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-12-04 16:33:29.953521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-12-04 16:33:29.953689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10283 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:d8:00.0, compute capability: 7.5)
Normalizer being used <function zscore at 0x7f1359502ea0>
-5.8382284e-08
1.0000015
64/64 [==============================] - 30s 473ms/step
...
But it would be interest why there is such remapping between IDs
hi @yarikoptic - i was curious why CUDA_VISIBLE_DEVICES
had a different order. by default, cuda will use a simple heuristic to order the devices, so it could be different in different environments.
you can set CUDA_DEVICE_ORDER=PCI_BUS_ID
to have a deterministic order of gpus. i'e copied the relevant info from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars below:
CUDA_DEVICE_ORDER |
FASTEST_FIRST , PCI_BUS_ID , (default is FASTEST_FIRST ) |
FASTEST_FIRST causes CUDA to guess which device is fastest using a simple heuristic, and make that device 0, leaving the order of the rest of the devices unspecified. PCI_BUS_ID orders devices by PCI bus ID in ascending order. |
---|
please reopen if this is still an issue
Thanks. I think it might be valuable to add that info into readme to resolve this issue "fully"
Working on a server with two GPUs where first one (smaller) used for regular graphics. Need to select the card 1 (the second one), how to do it? I don't see any option in 0.4 (using image)