tromp / cuckoo

a memory-bound graph-theoretic proof-of-work system
Other
822 stars 173 forks source link

Handle CUDA errors more gracefully #57

Closed yeastplume closed 5 years ago

yeastplume commented 5 years ago

Previous to this, the gpuAssert macro would just print and exit on failure, leaving no real way for the calling thread to handle the error gracefully. This changes the macro to store its error code and exit the current function or constructor (depending on variant), and exits gracefully, placing the error reason in the stats struct returned to the caller.

From the command line the output (for an out-of-memory error in mean.cu, the most common) looks like this:

GeForce GTX 1080 with 8118MB @ 256 bits x 5005MHz
Looking for 42-cycle on cuckatoo29("",0) with 50% edges, 64*64 buckets, 176 trims, and 64 thread blocks.
Using 6976MB of global memory.
Error initialising trimmer. Aborting.
Reason: Device 0 GPUassert: out of memory mean.cu 389

With this in place, instead of leaving a mess in the TUI when there's a cuda error (due to previous exit()), the grin-miner tui will show 'errored' next to the device and will output the reason to the logs:

Oct 26 11:49:52.042 DEBG Mining: Plugin 0 - Device 0 (CPU) at Cuck(at)oo29 - Status: OK : Last Graph time: 6.548810883s; Graphs per second: 0.153 - Total Attempts: 2
Oct 26 11:49:52.042 DEBG Mining: Plugin 1 - Device 0 (GeForce GTX 1080) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/projects/rust/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 389
Oct 26 11:49:52.042 INFO Mining: Cuck(at)oo at 0.1526994774877209 gps (graphs per second)
Oct 26 11:49:55.017 DEBG Mining: Plugin 0 - Device 0 (CPU) at Cuck(at)oo29 - Status: OK : Last Graph time: 6.548810883s; Graphs per second: 0.153 - Total Attempts: 2
Oct 26 11:49:55.017 DEBG Mining: Plugin 1 - Device 0 (GeForce GTX 1080) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/projects/rust/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 389
Oct 26 11:49:55.017 INFO Mining: Cuck(at)oo at 0.1526994774877209 gps (graphs per second)