wilicc / gpu-burn

Multi-GPU CUDA stress test
BSD 2-Clause "Simplified" License
1.37k stars 295 forks source link

How to Specify GPU? #39

Closed Leong1230 closed 3 years ago

Leong1230 commented 3 years ago

Hello!

How can I specify one of the GPUs to run gpu-burn on it, instead of run on all the GPUs in our cluster by using "./gpu_burn"? Could you please consider adding an option for this?

Thanks!

Wildcarde commented 3 years ago

Normally you'd be able to do this via the env variable CUDA_VISIBLE_DEVICES but for some reason / some how this package seems to ignore that setting. I've tested the 1.1 release and it completely ignores the setting.

edit: I've resorted to simply turning off the gpus I don't want to test using echo 1 > /sys/bus/pci/devices/0000:<nvidia device address>/remove. This however turned the device off completely requiring a reboot after.

Wildcarde commented 3 years ago

Followup, I'm discovering that this approach may unfortunately be invalid, it causes gpu_burn to do something weird and I wind up with nothing but errors and 0 activity on any of the gpus:

100.0%  proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s)   errors: 834150440  (WARNING!)- 834150440  (WARNING!)- 834150440  (WARNING!)- 834150440  (WARNING!)- 834150440  (WARNING!)- 834150440  (WARNING!)  temps: 37 C - 38 C - 37 C - 37 C - 37 C - 36 C 
Killing processes.. done

Tested 6 GPUs:
        GPU 0: FAULTY
        GPU 1: FAULTY
        GPU 2: FAULTY
        GPU 3: FAULTY
        GPU 4: FAULTY
        GPU 5: FAULTY

This is with 2 of the 8 cards in this machine disabled.

Leong1230 commented 3 years ago

All right. Thanks for your reply!