Open solardiz opened 4 years ago
switch to a more suitable default mask for benchmarks.
OTOH, that different mask makes NT-opencl usually (but not always) auto-tune to unreasonably high GWS, which hurts a lot:
$ john -te -form=nt-opencl
Device 1: Tesla V100-SXM2-16GB
Benchmarking: NT-opencl [MD4 OpenCL/mask accel]... LWS=128 GWS=40960 (320 blocks) x24700 DONE
Raw: 34558M c/s real, 34558M c/s virtual
$ john -te -form=nt-opencl -mask='?a?a?a?a?a?a?a'
Device 1: Tesla V100-SXM2-16GB
Benchmarking: NT-opencl (length 7) [MD4 OpenCL/mask accel]... LWS=128 GWS=655360 (5120 blocks) x9025 DONE
Raw: 9434M c/s real, 9389M c/s virtual
$ GWS=40960 john -te -form=nt-opencl -mask='?a?a?a?a?a?a?a'
Device 1: Tesla V100-SXM2-16GB
Benchmarking: NT-opencl (length 7) [MD4 OpenCL/mask accel]... LWS=128 GWS=40960 (320 blocks) x9025 DONE
Raw: 33839M c/s real, 33839M c/s virtual
Maybe there's an auto-tuning shortcoming for us to fix there.
Default benchmark:
Different mask:
Also longer benchmark (didn't make a difference):
Actual cracking:
So even when comparing against 254 loaded hashes, we got much better speed than what the benchmark got with the same mask after running for the same time. (Somehow the speed was poor early on, and it kept growing. In fact, the average speed would be even higher for a longer run.)
Checking
nvidia-smi
, I see that GPU utilization is somewhat low during actual cracking (around 75%) and even lower during benchmark (after the auto-tuning is complete, it nevertheless fluctuates between 0% and 80%, with average perhaps around 40%).The lower GPU utilization during benchmark explains the speed difference, but I am puzzled why the utilization is lower. We could also look into and improve GPU utilization during actual cracking, and switch to a more suitable default mask for benchmarks.