Closed Leong1230 closed 3 years ago
Normally you'd be able to do this via the env variable CUDA_VISIBLE_DEVICES
but for some reason / some how this package seems to ignore that setting. I've tested the 1.1 release and it completely ignores the setting.
edit: I've resorted to simply turning off the gpus I don't want to test using echo 1 > /sys/bus/pci/devices/0000:<nvidia device address>/remove
. This however turned the device off completely requiring a reboot after.
Followup, I'm discovering that this approach may unfortunately be invalid, it causes gpu_burn to do something weird and I wind up with nothing but errors and 0 activity on any of the gpus:
100.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: 834150440 (WARNING!)- 834150440 (WARNING!)- 834150440 (WARNING!)- 834150440 (WARNING!)- 834150440 (WARNING!)- 834150440 (WARNING!) temps: 37 C - 38 C - 37 C - 37 C - 37 C - 36 C
Killing processes.. done
Tested 6 GPUs:
GPU 0: FAULTY
GPU 1: FAULTY
GPU 2: FAULTY
GPU 3: FAULTY
GPU 4: FAULTY
GPU 5: FAULTY
This is with 2 of the 8 cards in this machine disabled.
All right. Thanks for your reply!
Hello!
How can I specify one of the GPUs to run gpu-burn on it, instead of run on all the GPUs in our cluster by using "./gpu_burn"? Could you please consider adding an option for this?
Thanks!