Open magnumripper opened 5 years ago
Examples
AMD Radeon Pro 560 Compute Engine (laptop). 4096 is the last size that didn't almost double the wall clock time (it's actually the very shortest of them all).
xfer: 16.640us, init: 87.200us, loop: 78x981.600us, pass2: 57.600us, final: 161.920us, xfer: 6.240us
gws: 1024 13315c/s 218126330 rounds/s 76.903ms per crypt_all()!
xfer: 23.360us, init: 100.320us, loop: 78x985.600us, pass2: 69.280us, final: 163.360us, xfer: 8.640us
gws: 2048 26510c/s 434286820 rounds/s 77.251ms per crypt_all()+
xfer: 43.040us, init: 265.920us, loop: 78x950.880us, pass2: 136.160us, final: 236.800us, xfer: 14.720us
gws: 4096 54704c/s 896160928 rounds/s 74.874ms per crypt_all()!
xfer: 96.160us, init: 706.400us, loop: 78x1.694ms, pass2: 499.200us, final: 261.440us, xfer: 28.320us
gws: 8192 61233c/s 1003119006 rounds/s 133.783ms per crypt_all()+
xfer: 184.640us, init: 1.373ms, loop: 78x3.301ms, pass2: 1.430ms, final: 318.560us, xfer: 42.400us
gws: 16384 62790c/s 1028625780 rounds/s 260.930ms per crypt_all()+
xfer: 313.120us, init: 2.975ms, loop: 78x6.447ms, pass2: 2.817ms, final: 623.360us, xfer: 78.720us
gws: 32768 64276c/s 1052969432 rounds/s 509.800ms per crypt_all()+
xfer: 755.520us, init: 6.119ms, loop: 78x13.024ms, pass2: 2.023ms, final: 739.840us, xfer: 150.880us
gws: 65536 63887c/s 1046596834 rounds/s 1.025s per crypt_all()
xfer: 1.357ms, init: 9.778ms, loop: 78x25.143ms, pass2: 3.848ms, final: 1.399ms, xfer: 374.720us
gws: 131072 66259c/s 1085454938 rounds/s 1.978s per crypt_all()+
xfer: 2.516ms, init: 15.451ms, loop: 78x51.778ms, pass2: 12.131ms, final: 3.700ms, xfer: 635.360us
gws: 262144 64350c/s 1054181700 rounds/s 4.073s per crypt_all()
xfer: 16.461ms, init: 24.128ms, loop: 78x100.473ms, pass2: 15.337ms, final: 7.171ms, xfer: 1.273ms
gws: 524288 66346c/s 1086880172 rounds/s 7.902s per crypt_all()
xfer: 17.980ms, init: 35.570ms, loop: 78x200.785ms (exceeds 200ms)
xfer: 696.160us, init: 6.317ms, loop: 78x22.324ms, pass2: 5.657ms, final: 1.082ms, xfer: 177.920us
gws: 65536 37333c/s 611589206 rounds/s 1.755s per crypt_all()-
gfx900 [Radeon RX Vega]. 16384 is last size that didn't almost double the wall clock time.
xfer: 22.816us, init: 45.926us, loop: 78x389.778us, pass2: 27.260us, final: 92.740us, xfer: 7.556us
gws: 4096 133844c/s 2192632408 rounds/s 30.602ms per crypt_all()!
xfer: 51.260us, init: 48.296us, loop: 78x394.518us, pass2: 27.852us, final: 91.702us, xfer: 12.296us
gws: 8192 264193c/s 4328009726 rounds/s 31.007ms per crypt_all()+
xfer: 91.704us, init: 55.110us, loop: 78x402.222us, pass2: 35.852us, final: 94.074us, xfer: 23.110us
gws: 16384 517220c/s 8473098040 rounds/s 31.676ms per crypt_all()+
xfer: 162.518us, init: 87.556us, loop: 78x779.260us, pass2: 55.110us, final: 128.148us, xfer: 43.852us
gws: 32768 534840c/s 8761748880 rounds/s 61.266ms per crypt_all()+
xfer: 321.630us, init: 167.258us, loop: 78x1.537ms, pass2: 142.074us, final: 190.964us, xfer: 85.036us
gws: 65536 542417c/s 8885875294 rounds/s 120.821ms per crypt_all()+
xfer: 673.926us, init: 317.186us, loop: 78x3.041ms, pass2: 271.556us, final: 310.964us, xfer: 167.260us
gws: 131072 548432c/s 8984413024 rounds/s 238.993ms per crypt_all()+
xfer: 1.567ms, init: 592.444us, loop: 78x6.053ms, pass2: 521.482us, final: 550.814us, xfer: 344.148us
gws: 262144 550921c/s 9025187822 rounds/s 475.828ms per crypt_all()
xfer: 2.780ms, init: 1.145ms, loop: 78x12.075ms, pass2: 1.023ms, final: 1.032ms, xfer: 714.518us
gws: 524288 552655c/s 9053594210 rounds/s 948.670ms per crypt_all()
xfer: 5.368ms, init: 2.179ms, loop: 78x24.118ms, pass2: 2.024ms, final: 1.982ms, xfer: 1.362ms
gws: 1048576 553523c/s 9067813786 rounds/s 1.894s per crypt_all()
xfer: 10.363ms, init: 4.236ms, loop: 78x48.207ms, pass2: 4.027ms, final: 4.083ms, xfer: 2.717ms
gws: 2097152 553916c/s 9074251912 rounds/s 3.786s per crypt_all()
xfer: 20.605ms, init: 8.436ms, loop: 78x96.915ms, pass2: 8.062ms, final: 8.081ms, xfer: 5.268ms
gws: 4194304 551098c/s 9028087436 rounds/s 7.610s per crypt_all()
xfer: 41.941ms, init: 16.929ms, loop: 78x206.835ms (exceeds 200ms)
xfer: 323.556us, init: 173.480us, loop: 78x1.535ms, pass2: 137.926us, final: 189.334us, xfer: 84.444us
gws: 65536 542876c/s 8893394632 rounds/s 120.719ms per crypt_all()-
GeForce GTX 1080. 10240 is last size that didn't almost double in wall clock time.
xfer: 52.864us, init: 39.776us, loop: 78x334.784us, pass2: 29.632us, final: 57.888us, xfer: 11.840us
gws: 2560 97307c/s 1594083274 rounds/s 26.308ms per crypt_all()!
xfer: 104.800us, init: 28.608us, loop: 78x331.360us, pass2: 24.096us, final: 34.944us, xfer: 24.064us
gws: 5120 196426c/s 3217850732 rounds/s 26.065ms per crypt_all()!
xfer: 207.744us, init: 53.632us, loop: 78x344.096us, pass2: 37.312us, final: 60.096us, xfer: 48.480us
gws: 10240 375779c/s 6156011578 rounds/s 27.250ms per crypt_all()+
xfer: 412.768us, init: 124.512us, loop: 78x661.664us, pass2: 107.392us, final: 61.536us, xfer: 97.312us
gws: 20480 390693c/s 6400332726 rounds/s 52.419ms per crypt_all()+
xfer: 823.744us, init: 222.816us, loop: 78x1.328ms, pass2: 238.912us, final: 129.344us, xfer: 195.264us
gws: 40960 389301c/s 6377528982 rounds/s 105.214ms per crypt_all()
xfer: 1.652ms, init: 368.992us, loop: 78x2.514ms, pass2: 404.672us, final: 239.296us, xfer: 389.888us
gws: 81920 411255c/s 6737179410 rounds/s 199.194ms per crypt_all()+
xfer: 3.294ms, init: 632.256us, loop: 78x4.877ms, pass2: 726.912us, final: 467.296us, xfer: 780.320us
gws: 163840 424048c/s 6946754336 rounds/s 386.370ms per crypt_all()+
xfer: 6.585ms, init: 1.173ms, loop: 78x9.697ms, pass2: 1.361ms, final: 913.696us, xfer: 1.561ms
gws: 327680 426610c/s 6988725020 rounds/s 768.101ms per crypt_all()
xfer: 13.168ms, init: 2.175ms, loop: 78x19.343ms, pass2: 2.505ms, final: 1.872ms, xfer: 3.122ms
gws: 655360 427821c/s 7008563622 rounds/s 1.531s per crypt_all()
xfer: 26.339ms, init: 4.310ms, loop: 78x39.027ms, pass2: 4.942ms, final: 3.733ms, xfer: 6.261ms
gws: 1310720 424162c/s 6948621884 rounds/s 3.090s per crypt_all()
xfer: 52.677ms, init: 8.506ms, loop: 78x79.955ms, pass2: 9.648ms, final: 7.530ms, xfer: 12.494ms
gws: 2621440 414250c/s 6786243500 rounds/s 6.328s per crypt_all()
xfer: 105.397ms, init: 16.845ms, loop: 78x163.210ms, pass2: 19.068ms, final: 15.035ms, xfer: 25.191ms
gws: 5242880 405999c/s 6651075618 rounds/s 12.913s per crypt_all()
Hardware resources exhausted
xfer: 1.645ms, init: 407.008us, loop: 78x2.467ms, pass2: 430.944us, final: 238.368us, xfer: 389.888us
gws: 81920 418868c/s 6861895576 rounds/s 195.574ms per crypt_all()-
See also #3779
While tuning for best GWS, we should note the first size that starts "flying" and set min_kpc to that.
When is that though? In some cases it seems we can pick the smallest GWS that doesn't almost double the wall clock time (it can even be faster!) from previous size. Or we could just pick the next best size (which sometimes is half of max_kpc and sometimes way lower).