openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.3k stars 2.1k forks source link

No OpenCL device utilization report for some formats that also use CPU/OpenMP #4318

Closed solardiz closed 4 years ago

solardiz commented 4 years ago
[solar@super src]$ ../run/john -te -dev=5 -form=electrum-modern-opencl 
Will run 32 OpenMP threads
Device 5: GeForce GTX TITAN X
Benchmarking: electrum-modern-opencl, Electrum Wallet 2.8+ [PBKDF2-SHA512 OpenCL]... (32xOMP) LWS=32 GWS=98304 (3072 blocks) DONE
Raw:    49398 c/s real, 2692 c/s virtual

[solar@super src]$ ../run/john -te -dev=5 -form=tezos-opencl
Will run 32 OpenMP threads
Device 5: GeForce GTX TITAN X
Benchmarking: tezos-opencl, Tezos Key [PBKDF2-SHA512 OpenCL]... (32xOMP) LWS=32 GWS=98304 (3072 blocks) DONE
Speed for cost 1 (iteration count) of 2048
Raw:    73635 c/s real, 21117 c/s virtual

For other formats we also report the OpenCL device utilization after "c/s virtual", e.g.:

[solar@super src]$ ../run/john -te -dev=5 -form=gpg-opencl
Will run 32 OpenMP threads
Device 5: GeForce GTX TITAN X
Benchmarking: gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1/SHA2 OpenCL]... (32xOMP) LWS=32 GWS=24576 (768 blocks) DONE
Speed for cost 1 (s2k-count) of 65536, cost 2 (hash algorithm [2:SHA1 8:SHA256 10:SHA512]) of 2, cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256 10:Twofish 11:Camellia128 12:Camellia192 13:Camellia256]) of 3
Warning: "Many salts" test limited: 180/256
Many salts:     2200K c/s real, 540792 c/s virtual, Dev#5 util: 73%
Only one salt:  2125K c/s real, 521674 c/s virtual, Dev#5 util: 69%

Note that this one also appears to use CPU/OpenMP (which is probably why its OpenCL device utilization is low'ish).

Why the discrepancy?

magnumripper commented 4 years ago

The utilization figure drops as the GPU is idling during CPU post-processing, and for various reasons we don't report 0% at all.

solardiz commented 4 years ago

Are you saying it'd have been 0% (and thus suppressed for that reason) for electrum-modern-opencl and tezos-opencl? Would that have been an instantaneous utilization measured at a bad time and not reflecting the average utilization during the benchmark?

magnumripper commented 4 years ago

Are you saying it'd have been 0% (and thus suppressed for that reason)

Surely it was reported as 0% by the driver/runtime but who knows what that really means, see below.

Would that have been an instantaneous utilization measured at a bad time and not reflecting the average utilization during the benchmark?

I believe it's implementation-dependent - AMD and nvidia may report differently. I think I read long ago that one of them would report something like "utilization last second" but I certainly never found any detailed description for either of them (and it could have changed since then for that matter).

For trying to get some picture of what is happening, I've been trying things like

See also #3216 and #3242

solardiz commented 4 years ago

Given magnum's explanation, I think there's nothing to do on this issue, and we need to work on #3242 instead. I'll close this one.