wilicc / gpu-burn

Multi-GPU CUDA stress test
BSD 2-Clause "Simplified" License
1.37k stars 295 forks source link

gpu-burn should exit early on init errors #14

Open grische opened 4 years ago

grische commented 4 years ago

If there is an init error on the GPU, gpu-burn does not exist but instead continues and prints "OK" at the end.

$ ./gpu_burn 3
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 1: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 2: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 3: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 4: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 5: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 6: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 7: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
Couldn't init a GPU test: Error in "init": CUBLAS_STATUS_NOT_INITIALIZED
100.0%  proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s)   errors: 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0   temps: 30 C - 31 C - 32 C - 29 C - 34 C - 31 C - 32 C - 30 C
        Summary at:   Wed Dec  4 18:44:29 UTC 2019

100.0%  proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s)   errors: 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0   temps: 30 C - 31 C - 32 C - 29 C - 34 C - 31 C - 32 C - 30 C
Killing processes.. done

Tested 10 GPUs:
        GPU 0: OK
        GPU 1: OK
        GPU 2: OK
        GPU 3: OK
        GPU 4: OK
        GPU 5: OK
        GPU 6: OK
        GPU 7: OK

It would be better to stop at the point where the error is thrown and not continue until the timer expired.