openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.36k stars 2.11k forks source link

OpenCL auto-tune: Try getting a sane figure for min_keys_per_crypt #4097

Open magnumripper opened 5 years ago

magnumripper commented 5 years ago

See also #3779

While tuning for best GWS, we should note the first size that starts "flying" and set min_kpc to that.

When is that though? In some cases it seems we can pick the smallest GWS that doesn't almost double the wall clock time (it can even be faster!) from previous size. Or we could just pick the next best size (which sometimes is half of max_kpc and sometimes way lower).

magnumripper commented 5 years ago

Examples

AMD Radeon Pro 560 Compute Engine (laptop). 4096 is the last size that didn't almost double the wall clock time (it's actually the very shortest of them all).

xfer: 16.640us, init: 87.200us, loop: 78x981.600us, pass2: 57.600us, final: 161.920us, xfer: 6.240us
gws:      1024    13315c/s   218126330 rounds/s  76.903ms per crypt_all()!
xfer: 23.360us, init: 100.320us, loop: 78x985.600us, pass2: 69.280us, final: 163.360us, xfer: 8.640us
gws:      2048    26510c/s   434286820 rounds/s  77.251ms per crypt_all()+
xfer: 43.040us, init: 265.920us, loop: 78x950.880us, pass2: 136.160us, final: 236.800us, xfer: 14.720us
gws:      4096    54704c/s   896160928 rounds/s  74.874ms per crypt_all()!
xfer: 96.160us, init: 706.400us, loop: 78x1.694ms, pass2: 499.200us, final: 261.440us, xfer: 28.320us
gws:      8192    61233c/s  1003119006 rounds/s 133.783ms per crypt_all()+
xfer: 184.640us, init: 1.373ms, loop: 78x3.301ms, pass2: 1.430ms, final: 318.560us, xfer: 42.400us
gws:     16384    62790c/s  1028625780 rounds/s 260.930ms per crypt_all()+
xfer: 313.120us, init: 2.975ms, loop: 78x6.447ms, pass2: 2.817ms, final: 623.360us, xfer: 78.720us
gws:     32768    64276c/s  1052969432 rounds/s 509.800ms per crypt_all()+
xfer: 755.520us, init: 6.119ms, loop: 78x13.024ms, pass2: 2.023ms, final: 739.840us, xfer: 150.880us
gws:     65536    63887c/s  1046596834 rounds/s    1.025s per crypt_all()
xfer: 1.357ms, init: 9.778ms, loop: 78x25.143ms, pass2: 3.848ms, final: 1.399ms, xfer: 374.720us
gws:    131072    66259c/s  1085454938 rounds/s    1.978s per crypt_all()+
xfer: 2.516ms, init: 15.451ms, loop: 78x51.778ms, pass2: 12.131ms, final: 3.700ms, xfer: 635.360us
gws:    262144    64350c/s  1054181700 rounds/s    4.073s per crypt_all()
xfer: 16.461ms, init: 24.128ms, loop: 78x100.473ms, pass2: 15.337ms, final: 7.171ms, xfer: 1.273ms
gws:    524288    66346c/s  1086880172 rounds/s    7.902s per crypt_all()
xfer: 17.980ms, init: 35.570ms, loop: 78x200.785ms (exceeds 200ms)
xfer: 696.160us, init: 6.317ms, loop: 78x22.324ms, pass2: 5.657ms, final: 1.082ms, xfer: 177.920us
gws:     65536    37333c/s   611589206 rounds/s    1.755s per crypt_all()-

gfx900 [Radeon RX Vega]. 16384 is last size that didn't almost double the wall clock time.

xfer: 22.816us, init: 45.926us, loop: 78x389.778us, pass2: 27.260us, final: 92.740us, xfer: 7.556us
gws:      4096   133844c/s  2192632408 rounds/s  30.602ms per crypt_all()!
xfer: 51.260us, init: 48.296us, loop: 78x394.518us, pass2: 27.852us, final: 91.702us, xfer: 12.296us
gws:      8192   264193c/s  4328009726 rounds/s  31.007ms per crypt_all()+
xfer: 91.704us, init: 55.110us, loop: 78x402.222us, pass2: 35.852us, final: 94.074us, xfer: 23.110us
gws:     16384   517220c/s  8473098040 rounds/s  31.676ms per crypt_all()+
xfer: 162.518us, init: 87.556us, loop: 78x779.260us, pass2: 55.110us, final: 128.148us, xfer: 43.852us
gws:     32768   534840c/s  8761748880 rounds/s  61.266ms per crypt_all()+
xfer: 321.630us, init: 167.258us, loop: 78x1.537ms, pass2: 142.074us, final: 190.964us, xfer: 85.036us
gws:     65536   542417c/s  8885875294 rounds/s 120.821ms per crypt_all()+
xfer: 673.926us, init: 317.186us, loop: 78x3.041ms, pass2: 271.556us, final: 310.964us, xfer: 167.260us
gws:    131072   548432c/s  8984413024 rounds/s 238.993ms per crypt_all()+
xfer: 1.567ms, init: 592.444us, loop: 78x6.053ms, pass2: 521.482us, final: 550.814us, xfer: 344.148us
gws:    262144   550921c/s  9025187822 rounds/s 475.828ms per crypt_all()
xfer: 2.780ms, init: 1.145ms, loop: 78x12.075ms, pass2: 1.023ms, final: 1.032ms, xfer: 714.518us
gws:    524288   552655c/s  9053594210 rounds/s 948.670ms per crypt_all()
xfer: 5.368ms, init: 2.179ms, loop: 78x24.118ms, pass2: 2.024ms, final: 1.982ms, xfer: 1.362ms
gws:   1048576   553523c/s  9067813786 rounds/s    1.894s per crypt_all()
xfer: 10.363ms, init: 4.236ms, loop: 78x48.207ms, pass2: 4.027ms, final: 4.083ms, xfer: 2.717ms
gws:   2097152   553916c/s  9074251912 rounds/s    3.786s per crypt_all()
xfer: 20.605ms, init: 8.436ms, loop: 78x96.915ms, pass2: 8.062ms, final: 8.081ms, xfer: 5.268ms
gws:   4194304   551098c/s  9028087436 rounds/s    7.610s per crypt_all()
xfer: 41.941ms, init: 16.929ms, loop: 78x206.835ms (exceeds 200ms)
xfer: 323.556us, init: 173.480us, loop: 78x1.535ms, pass2: 137.926us, final: 189.334us, xfer: 84.444us
gws:     65536   542876c/s  8893394632 rounds/s 120.719ms per crypt_all()-

GeForce GTX 1080. 10240 is last size that didn't almost double in wall clock time.

xfer: 52.864us, init: 39.776us, loop: 78x334.784us, pass2: 29.632us, final: 57.888us, xfer: 11.840us
gws:      2560    97307c/s  1594083274 rounds/s  26.308ms per crypt_all()!
xfer: 104.800us, init: 28.608us, loop: 78x331.360us, pass2: 24.096us, final: 34.944us, xfer: 24.064us
gws:      5120   196426c/s  3217850732 rounds/s  26.065ms per crypt_all()!
xfer: 207.744us, init: 53.632us, loop: 78x344.096us, pass2: 37.312us, final: 60.096us, xfer: 48.480us
gws:     10240   375779c/s  6156011578 rounds/s  27.250ms per crypt_all()+
xfer: 412.768us, init: 124.512us, loop: 78x661.664us, pass2: 107.392us, final: 61.536us, xfer: 97.312us
gws:     20480   390693c/s  6400332726 rounds/s  52.419ms per crypt_all()+
xfer: 823.744us, init: 222.816us, loop: 78x1.328ms, pass2: 238.912us, final: 129.344us, xfer: 195.264us
gws:     40960   389301c/s  6377528982 rounds/s 105.214ms per crypt_all()
xfer: 1.652ms, init: 368.992us, loop: 78x2.514ms, pass2: 404.672us, final: 239.296us, xfer: 389.888us
gws:     81920   411255c/s  6737179410 rounds/s 199.194ms per crypt_all()+
xfer: 3.294ms, init: 632.256us, loop: 78x4.877ms, pass2: 726.912us, final: 467.296us, xfer: 780.320us
gws:    163840   424048c/s  6946754336 rounds/s 386.370ms per crypt_all()+
xfer: 6.585ms, init: 1.173ms, loop: 78x9.697ms, pass2: 1.361ms, final: 913.696us, xfer: 1.561ms
gws:    327680   426610c/s  6988725020 rounds/s 768.101ms per crypt_all()
xfer: 13.168ms, init: 2.175ms, loop: 78x19.343ms, pass2: 2.505ms, final: 1.872ms, xfer: 3.122ms
gws:    655360   427821c/s  7008563622 rounds/s    1.531s per crypt_all()
xfer: 26.339ms, init: 4.310ms, loop: 78x39.027ms, pass2: 4.942ms, final: 3.733ms, xfer: 6.261ms
gws:   1310720   424162c/s  6948621884 rounds/s    3.090s per crypt_all()
xfer: 52.677ms, init: 8.506ms, loop: 78x79.955ms, pass2: 9.648ms, final: 7.530ms, xfer: 12.494ms
gws:   2621440   414250c/s  6786243500 rounds/s    6.328s per crypt_all()
xfer: 105.397ms, init: 16.845ms, loop: 78x163.210ms, pass2: 19.068ms, final: 15.035ms, xfer: 25.191ms
gws:   5242880   405999c/s  6651075618 rounds/s   12.913s per crypt_all()
Hardware resources exhausted
xfer: 1.645ms, init: 407.008us, loop: 78x2.467ms, pass2: 430.944us, final: 238.368us, xfer: 389.888us
gws:     81920   418868c/s  6861895576 rounds/s 195.574ms per crypt_all()-