neuronets / kwyk

Knowing what you know - Bayesian brain parcellation
https://doi.org/10.3389/fninf.2019.00067
Apache License 2.0
20 stars 9 forks source link

how to select GPU card to use? #17

Closed yarikoptic closed 4 years ago

yarikoptic commented 5 years ago

Working on a server with two GPUs where first one (smaller) used for regular graphics. Need to select the card 1 (the second one), how to do it? I don't see any option in 0.4 (using image)

Options:
  -m, --model [bvwn_multi_prior|bwn|bwn_multi]
                                  Model to use for prediction.  [required]
  -n, --n-samples INTEGER         Number of samples to predict.
  -b, --batch-size INTEGER        Batch size during prediction.
  --save-variance                 Save volume with variance across `n-samples`
                                  predictions.
  --save-entropy                  Save volume of entropy values.
  --version                       Show the version and exit.
  --help                          Show this message and exit.
satra commented 5 years ago
export CUDA_VISIBLE_DEVICES=0,1
export CUDA_VISIBLE_DEVICES=1
yarikoptic commented 5 years ago

cool! my google foo failed me to find such answer! I will need to make sure to pass them into santized singularity environment. kwyk in turn could become smarter and choose a GPU with max available RAM, or least of processes running, or smth like that (when multiple are available)

yarikoptic commented 4 years ago

Unfortunately seems have no effect on kwyk -- that one still selects GPU0 whenever I export CUDA_VISIBLE_DEVICES=1. Here is a full protocol with me entering singularity env, showing that env variable is set, trying to run kwyk, exiting singularity env, and running nvidia-smi (since there is no nvidia-smi in a container):

beast:~/datalad/labs/haxby/raiders$ SINGULARITYENV_CUDA_VISIBLE_DEVICES=1 singularity exec -e -B /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.74 -B /usr/lib/x86_64-linux-gnu/libcuda.so.1 ~/containers/images/neuronets/neuronets-kwyk--version-0.4-gpu.sing bash

beast:~/datalad/labs/haxby/raiders$ echo $CUDA_VISIBLE_DEVICES
1

beast:~/datalad/labs/haxby/raiders$ kwyk sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz  out
Bayesian dropout functions have been loaded.
Your version: v0.4 Latest version: 0.4
++ Conforming volume to 1mm^3 voxels and size 256x256x256.
/opt/kwyk/freesurfer/bin/mri_convert: line 2: /opt/kwyk/freesurfer/sources.sh: No such file or directory
mri_convert.bin --conform sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz /tmp/tmp5jlabqhd.nii.gz 
$Id: mri_convert.c,v 1.226 2016/02/26 16:15:24 mreuter Exp $
reading from sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz...
TR=10.00, TE=0.00, TI=0.00, flip angle=0.00
i_ras = (0, -1, 0)
j_ras = (0, 0, 1)
k_ras = (1, 0, 0)
changing data type from float to uchar (noscale = 0)...
MRIchangeType: Building histogram 
Reslicing using trilinear interpolation 
writing to /tmp/tmp5jlabqhd.nii.gz...
++ Running forward pass of model.
2019-12-04 16:24:50.009365: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-12-04 16:24:50.144303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GT 1030 major: 6 minor: 1 memoryClockRate(GHz): 1.468
pciBusID: 0000:af:00.0
totalMemory: 1.95GiB freeMemory: 1.56GiB
2019-12-04 16:24:50.144345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-12-04 16:24:50.780547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-04 16:24:50.780579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-12-04 16:24:50.780588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-12-04 16:24:50.780760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1374 MB memory) -> physical GPU (device: 0, name: GeForce GT 1030, pci bus id: 0000:af:00.0, compute capability: 6.1)
...
beast:~/datalad/labs/haxby/raiders$ exit
beast:~/datalad/labs/haxby/raiders$ nvidia-smi                                                                                                                                                                            Wed Dec  4 11:25:12 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.74       Driver Version: 418.74       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 1030     On   | 00000000:AF:00.0  On |                  N/A |
| 45%   46C    P0    N/A /  30W |    368MiB /  1998MiB |     33%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:D8:00.0 Off |                  N/A |
| 30%   38C    P8     1W / 250W |      1MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
...
yarikoptic commented 4 years ago

ok, the actual reason/mystery is that inside the container (or just by kwyk) the 'GeForce RTX 2080 Ti' is the device 0 not 1, so it worked:

beast:~/datalad/labs/haxby/raiders$ SINGULARITYENV_CUDA_VISIBLE_DEVICES=0 singularity exec -e -B /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.74 -B /usr/lib/x86_64-linux-gnu/libcuda.so.1 ~/containers/images/neuronets/neuronets-kwyk--version-0.4-gpu.sing kwyk sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz  out
Bayesian dropout functions have been loaded.
Your version: v0.4 Latest version: 0.4
...
totalMemory: 10.73GiB freeMemory: 10.57GiB
2019-12-04 16:33:29.298895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-12-04 16:33:29.953479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-04 16:33:29.953512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-12-04 16:33:29.953521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-12-04 16:33:29.953689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10283 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:d8:00.0, compute capability: 7.5)
Normalizer being used <function zscore at 0x7f1359502ea0>
-5.8382284e-08
1.0000015
64/64 [==============================] - 30s 473ms/step
...

But it would be interest why there is such remapping between IDs

kaczmarj commented 4 years ago

hi @yarikoptic - i was curious why CUDA_VISIBLE_DEVICES had a different order. by default, cuda will use a simple heuristic to order the devices, so it could be different in different environments.

you can set CUDA_DEVICE_ORDER=PCI_BUS_ID to have a deterministic order of gpus. i'e copied the relevant info from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars below:

CUDA_DEVICE_ORDER FASTEST_FIRST, PCI_BUS_ID, (default is FASTEST_FIRST) FASTEST_FIRST causes CUDA to guess which device is fastest using a simple heuristic, and make that device 0, leaving the order of the rest of the devices unspecified. PCI_BUS_ID orders devices by PCI bus ID in ascending order.
kaczmarj commented 4 years ago

please reopen if this is still an issue

yarikoptic commented 4 years ago

Thanks. I think it might be valuable to add that info into readme to resolve this issue "fully"