Open magnumripper opened 9 years ago
I'm thinking after that extended autotune (as above) we'd set initial GWS to 2048, but keep that GWS of 65536 as a card up the sleeve. After a couple of minutes running (no keypresses etc.), it could gear up and bump GWS to 65536.
With OpenCL and our current format interface, very slow formats and/or weak devices may lead to situations where we "can't" run at optimal work size because the total duration of each crypt_all() call would be too long (even tens of minutes).
Here's an example: Office2013 running on an nvidia GT650M:
So we have a limit at 10 seconds of total crypt_all() duration. But this format, on this device, takes 16 seconds already at 1024 - so we allow it. Then we see that 2048 takes about as long even though it does twice the number of hashes so obviously we allow that too. But for 4096 we get no speedup so we give up and settle for 2048. The actual single kernel duration though, is only 16 ms (it's called a thousand times).
If we ditch that "max 10 seconds" rule, we get this:
We see here that if we could allow a duration of 789 seconds(!) we'll get a performance boost of 33% at a work size of 65536.
The problem is we'll get extremely bad "response time" (13 minutes!) for things like pressing 'q' to quit. However, this kernel is obviously a split one - the longest single kernel duration is still below 400 ms. So maybe we can work something out.