openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.31k stars 2.1k forks source link

descrypt-opencl slow and confusing startup at many salts (appears stalled) #2652

Closed aguvillalba closed 7 years ago

aguvillalba commented 7 years ago

When trying to execute: ./john --test=0 --format=descrypt-opencl --device=0

I got this output:

Device 0: GeForce GTX 970
Testing: descrypt-opencl, traditional crypt(3) [DES OpenCL]... Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 1796 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 1796 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 1796 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 1796 bytes smem, 336 bytes cmem[0]
PASS

When trying to crack using --format=descrypt-opencl I get the same output.

Here my info for ./john --list=build-info:

Version: 1.8.0-jumbo-1-5774-g8d7ccaf
Build: linux-gnu 64-bit SSE4.1-ac OMP
SIMD: SSE4.1, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
$JOHN is ./
Format interface version: 14
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 5.4.0
GNU libc version: 2.23 (loaded: 2.23)
OpenCL headers version: 2.0
Crypto library: OpenSSL
OpenSSL library version: 01000207f
OpenSSL 1.0.2g  1 Mar 2016
GMP library version: 6.1.0
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
./john --list=opencl-devices
Platform #0 name: NVIDIA CUDA, version: OpenCL 1.2 CUDA 8.0.0
    Device #0 (0) name:     GeForce GTX 970
    Device vendor:          NVIDIA Corporation
    Device type:            GPU (LE)
    Device version:         OpenCL 1.2 CUDA
    Driver version:         375.66 [recommended]
    Native vector widths:   char 1, short 1, int 1, long 1
    Preferred vector width: char 1, short 1, int 1, long 1
    Global Memory:          3.0 GB
    Global Memory Cache:    208.2 KB
    Local Memory:           48.0 KB (Local)
    Max memory alloc. size: 1008.10 MB
    Max clock (MHz):        1304
    Profiling timer res.:   1000 ns
    Max Work Group Size:    1024
    Parallel compute cores: 13
    CUDA cores:             1664  (13 x 128)
    Speed index:            2169856
    Warp size:              32
    Max. GPRs/work-group:   65536
    Compute capability:     5.2 (sm_52)
    Kernel exec. timeout:   yes
    NVML id:                0
    PCI device topology:    03:00.0
    PCI lanes:              16/16
    Fan speed:              36%
    Temperature:            43°C
    Utilization:            18%
roycewilliams commented 7 years ago

What about that output indicates that it's not working?

You asked it to run a test with no time (--test=0). That's what it did. :) The rest of the diagnostic output looks normal for a first run.

Give it a non-zero time and you'll get actual results:


$ ./john --help | grep test
--test[=TIME]              run tests and benchmarks for TIME seconds each

$ ./john --test=10 --format=descrypt-opencl --device=0
Device 0: GeForce GTX 970
Benchmarking: descrypt-opencl, traditional crypt(3) [DES OpenCL]... DONE, GPU util:100%
Warning: "Many salts" test limited: 26/256
Many salts:     169895K c/s real, 164606K c/s virtual
Only one salt:  65600K c/s real, 64714K c/s virtual
aguvillalba commented 7 years ago

Why do I still get this output when execute this command: ./john --format=descrypt-opencl --min-length=8 --progress-every=10 --device=0 ~/jonny.txt

Device 0@my-desktop: GeForce GTX 970
Using default input encoding: UTF-8
Loaded 1771 password hashes with 1344 different salts (descrypt-opencl, traditional crypt(3) [DES OpenCL])
Remaining 1756 password hashes with 1336 different salts
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 7172 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 7172 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 7172 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 7172 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 7172 bytes smem, 336 bytes cmem[0]
Session aborted

Is it normal?

roycewilliams commented 7 years ago

Yes, the first time you run for a given compilation of John and hash format. If you run it again, the output should be less verbose.

aguvillalba commented 7 years ago

It happens every time I run that exact command. Also, if I run the same command but in the not opencl version, the output looks fine:

./john --format=descrypt --min-length=8 --progress-every=10 --device=0 ~/jonny.txt

Using default input encoding: UTF-8
Loaded 1771 password hashes with 1344 different salts (descrypt, traditional crypt(3) [DES 128/128 SSE2-16])
Remaining 1754 password hashes with 1334 different salts
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:06 51,42% 1/3 (ETA: 20:56:20) 0g/s 367046p/s 367046c/s 2842KC/s TdDimitr..lpaterng
0g 0:00:00:10 98,85% 1/3 (ETA: 20:56:19) 0g/s 367799p/s 367799c/s 2479KC/s plaura83..laura201
0g 0:00:00:20 82,72% 2/3 (ETA: 20:56:33) 0g/s 198246p/s 1913Kc/s 3554KC/s password..5anthony
Warning: MaxLen = 13 is too large for the current hash type, reduced to 8
0g 0:00:00:30  3/3 0g/s 133717p/s 2201Kc/s 3544KC/s samantha..benisout
0g 0:00:00:40  3/3 0g/s 100321p/s 2538Kc/s 3844KC/s samantha..benisout
0g 0:00:00:50  3/3 0g/s 80927p/s 2757Kc/s 4042KC/s beney123..pasermio

Why the output for "non opencl" version of the command works fine?

roycewilliams commented 7 years ago

Because the OpenCL kernel is not compiled dynamically on non-OpenCL attacks.

The issue is just cosmetic, correct? The OpenCL version of the attack still happens, even if the output is not what you expect?

aguvillalba commented 7 years ago

I do not know. I do not think that it is working in the opencl version. At least in the non-opencl version I get some passwords cracked, but in the opencl version I get none.

Is the output that I showed in the messages above an error or just comestic?

aguvillalba commented 7 years ago

Also, if I execute the test, I got this: ./john --test=20 --format=descrypt-opencl --device=0

Device 0@my-desktop: GeForce GTX 970
Benchmarking: descrypt-opencl, traditional crypt(3) [DES OpenCL]... Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_cmp' for 'sm_52'
ptxas info    : Function properties for DES_bs_cmp
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 96 registers, 392 bytes cmem[0], 48 bytes cmem[2]
ptxas info    : Compiling entry function 'DES_bs_cmp_high' for 'sm_52'
ptxas info    : Function properties for DES_bs_cmp_high
ptxas         .     256 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 40 registers, 392 bytes cmem[0], 48 bytes cmem[2]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_finalize_keys' for 'sm_52'
ptxas info    : Function properties for DES_bs_finalize_keys
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 32 registers, 352 bytes cmem[0], 32 bytes cmem[2]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 7172 bytes smem, 336 bytes cmem[0]
DONE, GPU util:100%
Warning: "Many salts" test limited: 53/256
Many salts: 172407K c/s real, 150264K c/s virtual
Only one salt:  56449K c/s real, 54429K c/s virtual

Does it mean that there is something wrong?

kost123 commented 7 years ago

I get it aswell in another format, pretty sure it's just debug info.

claudioandre-br commented 7 years ago

Why do I still get this output when execute this command: ./john --format=descrypt-opencl --min-length=8 --progress-every=10 --device=0 ~/jonny.txt

Or, if you wish, run:

The output is harmless (it is only debug info). But, it is not expected to appear for every run, unless cache is disabled.

claudioandre-br commented 7 years ago

Inside your https://github.com/magnumripper/JohnTheRipper/issues/2652#issuecomment-320325124 above, I noticed John printed Session aborted before any status line.

BTW: you can hide the build log decreasing the verbosity.

aguvillalba commented 7 years ago

Hi @claudioandre, thank you very much for your replies. I run the commands you mentioned:

$ rm -rf ~/.nv
$ ./john --format=descrypt-opencl --min-length=8 --progress-every=10 --device=0 ~/jonny.txt #1st run
$ ./john --format=descrypt-opencl --min-length=8 --progress-every=10 --device=0 ~/jonny.txt #2nd run

The 1st run during 2 minutes. And I have the same results: 1st run:

./john --format=descrypt-opencl --min-length=8 --progress-every=10 --device=0 --encoding=Latin1 ~/jonny.txt Device 0@my-desktop: GeForce GTX 970
Loaded 1771 password hashes with 1344 different salts (descrypt-opencl, traditional crypt(3) [DES OpenCL])
Remaining 1724 password hashes with 1316 different salts
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_cmp' for 'sm_52'
ptxas info    : Function properties for DES_bs_cmp
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 96 registers, 392 bytes cmem[0], 48 bytes cmem[2]
ptxas info    : Compiling entry function 'DES_bs_cmp_high' for 'sm_52'
ptxas info    : Function properties for DES_bs_cmp_high
ptxas         .     256 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 40 registers, 392 bytes cmem[0], 48 bytes cmem[2]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_finalize_keys' for 'sm_52'
ptxas info    : Function properties for DES_bs_finalize_keys
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 32 registers, 352 bytes cmem[0], 32 bytes cmem[2]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]

2nd run:

./john --format=descrypt-opencl --min-length=8 --progress-every=10 --device=0 --encoding=Latin1 ~/jonny.txt 
Device 0@my-desktop: GeForce GTX 970
Loaded 1771 password hashes with 1344 different salts (descrypt-opencl, traditional crypt(3) [DES OpenCL])
Remaining 1724 password hashes with 1316 different salts
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]

By the way, when running the commands, I cannot press 'q' to cancel the execution, only Ctrl+C works.

aguvillalba commented 7 years ago

@claudioandre, yes, this happens everytime I run the command, always.

I have run it with --verbosity=5 and this is the output:

./john --format=descrypt-opencl --min-length=8 --progress-every=10 --device=0 --encoding=Latin1 --verbosity=5 ~/jonny.txt 
initUnicode(UNICODE, ISO-8859-1/ISO-8859-1)
ISO-8859-1 -> ISO-8859-1 -> ISO-8859-1
Device 0@my-desktop: GeForce GTX 970
Loaded 1771 password hashes with 1344 different salts (descrypt-opencl, traditional crypt(3) [DES OpenCL])
Remaining 1724 password hashes with 1316 different salts
Loaded 6 hashes with 4 different salts to test db from test vectors
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER  $JOHN/kernels/DES_bs_hash_checking_kernel.cl
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER -D ITER_COUNT=1 -D MASK_ENABLED=0 -D LOC_0=-1 -D LOC_1=-1 -D LOC_2=-1 -D LOC_3=-1 -D IS_STATIC_GPU_MASK=0 -D CONST_CACHE_SIZE=65536 $JOHN/kernels/DES_bs_finalize_keys_kernel.cl
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER -D WORK_GROUP_SIZE=128 -D index0=31 -D index1=16 -D index2=17 -D index3=18 -D index4=3 -D index5=4 -D index6=3 -D index7=20 -D index8=21 -D index9=22 -D index10=7 -D index11=8 -D index24=15 -D index25=0 -D index26=1 -D index27=2 -D index28=19 -D index29=20 -D index30=19 -D index31=4 -D index32=5 -D index33=6 -D index34=23 -D index35=24 -D index48=63 -D index49=48 -D index50=49 -D index51=50 -D index52=35 -D index53=36 -D index54=35 -D index55=52 -D index56=53 -D index57=54 -D index58=39 -D index59=40 -D index72=47 -D index73=32 -D index74=33 -D index75=34 -D index76=51 -D index77=52 -D index78=51 -D index79=36 -D index80=37 -D index81=38 -D index82=55 -D index83=56  $JOHN/kernels/DES_bs_kernel_f.cl
Salt compiled from Source:910
GWS: 2097152, LWS: 128
Salt compiled from Binary:910
Salt compiled from Binary:2275
Salt compiled from Binary:990
Salt compiled from Binary:0
Updated internal tables and buffers for salt 910.
Updated internal tables and buffers for salt 2275.
Updated internal tables and buffers for salt 990.
Updated internal tables and buffers for salt 0.
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER  $JOHN/kernels/DES_bs_hash_checking_kernel.cl
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER -D ITER_COUNT=1 -D MASK_ENABLED=0 -D LOC_0=-1 -D LOC_1=-1 -D LOC_2=-1 -D LOC_3=-1 -D IS_STATIC_GPU_MASK=0 -D CONST_CACHE_SIZE=65536 $JOHN/kernels/DES_bs_finalize_keys_kernel.cl
Salt compiled from Binary:0
Salt compiled from Binary:512
Salt compiled from Binary:1024
Salt compiled from Binary:1280
Salt compiled from Binary:2048
Salt compiled from Binary:3328
Salt compiled from Binary:3584
Salt compiled from Binary:257
Salt compiled from Binary:3329
Salt compiled from Binary:3841
Salt compiled from Binary:2
Salt compiled from Binary:514
Salt compiled from Binary:1026
Salt compiled from Binary:2050
Salt compiled from Binary:2306
Salt compiled from Binary:3330
Salt compiled from Binary:515
Salt compiled from Binary:771
Salt compiled from Binary:1027
Salt compiled from Binary:1283
Salt compiled from Binary:1539
Salt compiled from Binary:2051
Salt compiled from Binary:2563
Salt compiled from Binary:3331
Salt compiled from Binary:772
Salt compiled from Binary:1284
Salt compiled from Binary:1796
Salt compiled from Binary:2052
Salt compiled from Binary:3332
Salt compiled from Binary:3844
Salt compiled from Binary:5
Salt compiled from Binary:517
Salt compiled from Binary:773
Salt compiled from Binary:1029
Salt compiled from Binary:1285
Salt compiled from Binary:1541
Salt compiled from Binary:2053
Salt compiled from Binary:2309
Salt compiled from Binary:518
Salt compiled from Binary:1030
Salt compiled from Binary:3078
Salt compiled from Binary:263
Salt compiled from Binary:2823
Salt compiled from Binary:3335
Salt compiled from Binary:520
Salt compiled from Binary:776
Salt compiled from Binary:1032
Salt compiled from Binary:2568
Salt compiled from Binary:3592
Salt compiled from Binary:1545
Salt compiled from Binary:1801
Salt compiled from Binary:2313
Salt compiled from Binary:3337
Salt compiled from Binary:1290
Salt compiled from Binary:1802
Salt compiled from Binary:2058
Salt compiled from Binary:3082
Salt compiled from Binary:3850
Salt compiled from Binary:523
Salt compiled from Binary:1035
Salt compiled from Binary:2571
Salt compiled from Binary:3083
Salt compiled from Binary:3339
Salt compiled from Binary:3595
Salt compiled from Binary:12
Salt compiled from Binary:1548
Salt compiled from Binary:2316
Salt compiled from Binary:3340
Salt compiled from Binary:3596
Salt compiled from Binary:269
Salt compiled from Binary:781
Salt compiled from Binary:2061
Salt compiled from Binary:2317
Salt compiled from Binary:3341
Salt compiled from Binary:270
Salt compiled from Binary:782
Salt compiled from Binary:2062
Salt compiled from Binary:2318
Salt compiled from Binary:3598
Salt compiled from Binary:3854
Salt compiled from Binary:271
Salt compiled from Binary:1295
Salt compiled from Binary:1551
Salt compiled from Binary:3343
Salt compiled from Binary:3599
Salt compiled from Binary:1808
Salt compiled from Binary:2064
Salt compiled from Binary:3088
Salt compiled from Binary:3600
Salt compiled from Binary:3856
Salt compiled from Binary:529
Salt compiled from Binary:2577
Salt compiled from Binary:3345
Salt compiled from Binary:18
Salt compiled from Binary:786
Salt compiled from Binary:1298
Salt compiled from Binary:1554
Salt compiled from Binary:1810
Salt compiled from Binary:2322
Salt compiled from Binary:2578
Salt compiled from Binary:3858
Salt compiled from Binary:1043
Salt compiled from Binary:1555
Salt compiled from Binary:2067
Salt compiled from Binary:2323
Salt compiled from Binary:3091
Salt compiled from Binary:20
Salt compiled from Binary:276
Salt compiled from Binary:788
Salt compiled from Binary:1044
Salt compiled from Binary:2068
Salt compiled from Binary:21
Salt compiled from Binary:277
Salt compiled from Binary:1045
Salt compiled from Binary:1301
Salt compiled from Binary:2069
Salt compiled from Binary:2325
Salt compiled from Binary:3861
Salt compiled from Binary:278
Salt compiled from Binary:790
Salt compiled from Binary:3606
Salt compiled from Binary:279
Salt compiled from Binary:535
Salt compiled from Binary:1559
Salt compiled from Binary:2071
Salt compiled from Binary:2583
Salt compiled from Binary:3607
Salt compiled from Binary:1816
Salt compiled from Binary:2840
Salt compiled from Binary:3352
Salt compiled from Binary:3608
Salt compiled from Binary:3864
Salt compiled from Binary:281
Salt compiled from Binary:793
Salt compiled from Binary:1049
Salt compiled from Binary:2841
Salt compiled from Binary:3865
Salt compiled from Binary:2330
Salt compiled from Binary:3354
Salt compiled from Binary:3610
Salt compiled from Binary:3866
Salt compiled from Binary:283
Salt compiled from Binary:1051
Salt compiled from Binary:2075
Salt compiled from Binary:2587
Salt compiled from Binary:3355
Salt compiled from Binary:3611
Salt compiled from Binary:28
Salt compiled from Binary:284
Salt compiled from Binary:1564
Salt compiled from Binary:2332
Salt compiled from Binary:1053
Salt compiled from Binary:2077
Salt compiled from Binary:2333
Salt compiled from Binary:30
Salt compiled from Binary:798
Salt compiled from Binary:2078
Salt compiled from Binary:3870
Salt compiled from Binary:31
Salt compiled from Binary:799
Salt compiled from Binary:1311
Salt compiled from Binary:1567
Salt compiled from Binary:1823
Salt compiled from Binary:2847
Salt compiled from Binary:3103
Salt compiled from Binary:1056
Salt compiled from Binary:1824
Salt compiled from Binary:2080
Salt compiled from Binary:3872
Salt compiled from Binary:33
Salt compiled from Binary:545
Salt compiled from Binary:1057
Salt compiled from Binary:1569
Salt compiled from Binary:2081
Salt compiled from Binary:2849
Salt compiled from Binary:3873
Salt compiled from Binary:34
Salt compiled from Binary:546
Salt compiled from Binary:1058
Salt compiled from Binary:1826
Salt compiled from Binary:3106
Salt compiled from Binary:547
Salt compiled from Binary:2595
Salt compiled from Binary:2851
Salt compiled from Binary:3875
Salt compiled from Binary:1060
Salt compiled from Binary:1316
Salt compiled from Binary:1572
Salt compiled from Binary:293
Salt compiled from Binary:2341
Salt compiled from Binary:3109
Salt compiled from Binary:3621
Salt compiled from Binary:38
Salt compiled from Binary:550
Salt compiled from Binary:2342
Salt compiled from Binary:3110
Salt compiled from Binary:1575
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER -D WORK_GROUP_SIZE=128 -D index0=15 -D index1=16 -D index2=17 -D index3=2 -D index4=3 -D index5=20 -D index6=3 -D index7=4 -D index8=21 -D index9=6 -D index10=7 -D index11=24 -D index24=31 -D index25=0 -D index26=1 -D index27=18 -D index28=19 -D index29=4 -D index30=19 -D index31=20 -D index32=5 -D index33=22 -D index34=23 -D index35=8 -D index48=47 -D index49=48 -D index50=49 -D index51=34 -D index52=35 -D index53=52 -D index54=35 -D index55=36 -D index56=53 -D index57=38 -D index58=39 -D index59=56 -D index72=63 -D index73=32 -D index74=33 -D index75=50 -D index76=51 -D index77=36 -D index78=51 -D index79=52 -D index80=37 -D index81=54 -D index82=55 -D index83=40  $JOHN/kernels/DES_bs_kernel_f.cl
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]
binary size 285774
Salt compiled from Source:2343
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER -D WORK_GROUP_SIZE=128 -D index0=15 -D index1=16 -D index2=17 -D index3=2 -D index4=3 -D index5=20 -D index6=3 -D index7=4 -D index8=5 -D index9=22 -D index10=7 -D index11=24 -D index24=31 -D index25=0 -D index26=1 -D index27=18 -D index28=19 -D index29=4 -D index30=19 -D index31=20 -D index32=21 -D index33=6 -D index34=23 -D index35=8 -D index48=47 -D index49=48 -D index50=49 -D index51=34 -D index52=35 -D index53=52 -D index54=35 -D index55=36 -D index56=37 -D index57=54 -D index58=39 -D index59=56 -D index72=63 -D index73=32 -D index74=33 -D index75=50 -D index76=51 -D index77=36 -D index78=51 -D index79=52 -D index80=53 -D index81=38 -D index82=55 -D index83=40  $JOHN/kernels/DES_bs_kernel_f.cl
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]
binary size 285774
Salt compiled from Source:2599
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER -D WORK_GROUP_SIZE=128 -D index0=15 -D index1=16 -D index2=17 -D index3=2 -D index4=3 -D index5=20 -D index6=3 -D index7=4 -D index8=21 -D index9=22 -D index10=7 -D index11=24 -D index24=31 -D index25=0 -D index26=1 -D index27=18 -D index28=19 -D index29=4 -D index30=19 -D index31=20 -D index32=5 -D index33=6 -D index34=23 -D index35=8 -D index48=47 -D index49=48 -D index50=49 -D index51=34 -D index52=35 -D index53=52 -D index54=35 -D index55=36 -D index56=53 -D index57=54 -D index58=39 -D index59=56 -D index72=63 -D index73=32 -D index74=33 -D index75=50 -D index76=51 -D index77=36 -D index78=51 -D index79=52 -D index80=37 -D index81=38 -D index82=55 -D index83=40  $JOHN/kernels/DES_bs_kernel_f.cl
Build log: 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'DES_bs_25' for 'sm_52'
ptxas info    : Function properties for DES_bs_25
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 168 registers, 28676 bytes smem, 336 bytes cmem[0]
binary size 285774
Salt compiled from Source:2855
Options used: -I /opt/JohnTheRipper/run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=375 -DDEV_VER_MINOR=66 -D_OPENCL_COMPILER -D WORK_GROUP_SIZE=128 -D index0=15 -D index1=16 -D index2=17 -D index3=2 -D index4=3 -D index5=20 -D index6=3 -D index7=4 -D index8=5 -D index9=6 -D index10=23 -D index11=24 -D index24=31 -D index25=0 -D index26=1 -D index27=18 -D index28=19 -D index29=4 -D index30=19 -D index31=20 -D index32=21 -D index33=22 -D index34=7 -D index35=8 -D index48=47 -D index49=48 -D index50=49 -D index51=34 -D index52=35 -D index53=52 -D index54=35 -D index55=36 -D index56=37 -D index57=38 -D index58=55 -D index59=56 -D index72=63 -D index73=32 -D index74=33 -D index75=50 -D index76=51 -D index77=36 -D index78=51 -D index79=52 -D index80=53 -D index81=54 -D index82=39 -D index83=40  $JOHN/kernels/DES_bs_kernel_f.cl

Any clue what is going on? Thanks for any help!

aguvillalba commented 7 years ago

From the logs I pasted above, it seems to me that the cracking is not working and nothing is going on. Am I wrong?

solardiz commented 7 years ago

This is an instance of what I described in a paragraph in #2613:

"Also, at the default verbosity level it is no longer clear what the process is doing - it appears to be stuck for minutes. Perhaps we should print at least one informational line about building the per-salt kernels (as well as how to disable this if the startup delay is undesirable and slower long-time speed would be OK?) even at default or otherwise low verbosity levels."

In short: our descrypt-opencl tries to optimize for lengthy runs or/and low salt counts, by building per-salt kernels. It'll take an hour or so for it to get up to speed when there are thousands of salts (up to the maximum of 4096). This is indeed a major usability drawback, and we should provide a way to disable this functionality - in fact, we should probably have it disabled by default when cracking multiple salts (more than 1? or more than some other low threshold?), with a message printed on how to re-enable it to regain the speedup for lengthy sessions.

solardiz commented 7 years ago

IIRC, in some older version of Sayantan's code, the per-salt kernels were being built along with cracking using the already-built kernels (as well as the generic kernel for the rest of salts, where no specialized kernel was built yet?) Looks like we've since lost that, and now there appears to be no progress at all until all kernels are built. Maybe we lost it when attempting to parallelize the kernel builds with OpenMP.

This may explain why we had found it acceptable to have the per-salt kernels enabled by default before, and why (at least) I no longer find it acceptable now (except when there's just one or few salts). As an option, we could try to revert to that previous approach where cracking and specialized kernel building could proceed in parallel, and if so we wouldn't necessarily have to change the default (but would still need to print explanatory messages, along with instructions on how to disable the per-salt kernels).

solardiz commented 7 years ago

Meanwhile, the advice for @aguvillalba is to leave this running for some hours. It'll eventually start making progress and gain speed.

aguvillalba commented 7 years ago

@solardiz you are right, it was compiling salts during 24 minutes, after that it started to crack passwords.

Thank you very much for the help!

claudioandre-br commented 7 years ago

All per-salt kernels are being build at once inside reset().

Perhaps we should print at least one informational line about building the per-salt kernels.

An easy and acceptable workaround is the patch below.

diff --git a/src/opencl_DES_bs_f_plug.c b/src/opencl_DES_bs_f_plug.c
index a37618d03..7dae2b813 100644
--- a/src/opencl_DES_bs_f_plug.c
+++ b/src/opencl_DES_bs_f_plug.c
@@ -686,8 +686,17 @@ static void reset(struct db_main *db)
 #if _OPENMP && PARALLEL_BUILD
 #pragma omp parallel for
 #endif
-               for (i = 0; i < num_salts; i++)
+#define BUILD_WARN (num_salts > 16)
+
+               if (BUILD_WARN)
+                       printf("WARNING: building per-salt kernels. It can take hours...\n");
+                       printf("Please, be patient.\n");
+
+               for (i = 0; i < num_salts; i++) {
                        init_kernel(salt_list[i], gpu_id, 1, 0, forced_global_keys ? 0 :local_work_size);
+                        advance_cursor();
+                        opencl_process_event();
+                }

                set_kernel_args_kpc();
        }
claudioandre-br commented 7 years ago

The output is clear and the cursor rotates nicely (tested using the default verbosity).

image

aguvillalba commented 7 years ago

@claudioandre that would be a good solution! At least the user knows what to expect. Although, I would remove the "WARNING" word and change it to "INFO", because the message is not really a warning from the execution, is just information.

Thank you very much!

claudioandre-br commented 7 years ago

The threshold (16) and the text are subject to change.

BTW: It seems my if is pure Python !?!?

solardiz commented 7 years ago

Thank you, Claudio. Can we make it simply:

if (num_salts > 1)
        fprintf(stderr, "Note: building per-salt kernels. This takes e.g. 2 hours for 4096 salts.\n");

This looks more consistent with other messages we print: we already use the "Note: " prefix on some, and we print them to stderr. It also sets more specific expectations, consistent with how current computers actually perform.

claudioandre-br commented 7 years ago

This is more close to a real patch.

diff --git a/src/opencl_DES_bs_f_plug.c b/src/opencl_DES_bs_f_plug.c
index a37618d03..7634c4eac 100644
--- a/src/opencl_DES_bs_f_plug.c
+++ b/src/opencl_DES_bs_f_plug.c
@@ -686,9 +686,17 @@ static void reset(struct db_main *db)
 #if _OPENMP && PARALLEL_BUILD
 #pragma omp parallel for
 #endif
-       for (i = 0; i < num_salts; i++)
+       if (num_salts > 1)
+           fprintf(stderr, "Note: building per-salt kernels. "
+               "This takes e.g. 2 hours for 4096 salts.\n");
+
+       for (i = 0; i < num_salts; i++) {
            init_kernel(salt_list[i], gpu_id, 1, 0, forced_global_keys ? 0 :local_work_size);
+           opencl_process_event();

+           if (options.verbosity < VERB_LEGACY)
+               advance_cursor();
+       }
        set_kernel_args_kpc();
    }
    else {
diff --git a/src/opencl_DES_bs_h_plug.c b/src/opencl_DES_bs_h_plug.c
index 4c20a4041..e165a4a5d 100644
--- a/src/opencl_DES_bs_h_plug.c
+++ b/src/opencl_DES_bs_h_plug.c
@@ -700,9 +700,17 @@ static void reset(struct db_main *db)
 #if _OPENMP && PARALLEL_BUILD
 #pragma omp parallel for
 #endif
-       for (i = 0; i < num_salts; i++)
+       if (num_salts > 1)
+           fprintf(stderr, "Note: building per-salt kernels. "
+               "This takes e.g. 2 hours for 4096 salts.\n");
+
+       for (i = 0; i < num_salts; i++) {
            init_kernel(salt_list[i], gpu_id, 1, 0, forced_global_keys ? 0 :local_work_size);
+           opencl_process_event();

+           if (options.verbosity < VERB_LEGACY)
+               advance_cursor();
+       }
        set_kernel_args_kpc();
    }
    else {
solardiz commented 7 years ago

Claudio, the added if block should be above the existing #pragma omp directive.

Yes, the "> 1" is inconsistent with the self-tests already triggering multiple builds, but printing that message when there's just 1 salt in the input would be confusing. So I think it's fine to keep the "> 1" as-is.

solardiz commented 7 years ago

Claudio fixed the confusion aspect, so closing this issue. We'll need separate issues for tracking more invasive changes we might want to make.

solardiz commented 7 years ago

At default verbosity level, the many "ptxas info ..." lines are printed (and so the "Note: ..." is buried inbetween them - after the self-test ones and before the rest - and quickly scrolls away), but the "Salt compiled from ..." lines are not. That's confusing. I think it should be the other way around: can we, specifically for descrypt-opencl, suppress the OpenCL backend's build messages by default (unless verbosity is increased), but enable the "Salt compled from ..." messages? Claudio? (Maybe this should be a new issue - if so, feel free to create one and re-close this one.)

magnumripper commented 7 years ago

@solardiz can we close this again now that #2665 was merged?

solardiz commented 7 years ago

OK. I'm fine with not enabling the "Salt compiled from ..." at default verbosity. Closing.