openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.28k stars 2.1k forks source link

Some failing opencl format tests with beignet on haswell GPU #1571

Closed frank-dittrich closed 8 years ago

frank-dittrich commented 9 years ago

During my experiments to get the Haswell GPU working, I probably found some beignet / libdrm / i915 driver issues and may be some bleeding-jumbo issues.

On my 64bit Fedora 22 with Haswell CPU Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz, after installing beignet (I tried both the fedora package and the latest http://anongit.freedesktop.org/git/beignet.git commit):

Here, I am not sure whether the multiple unsequenced modifications to 'p' warning indicates a real problem:

$ OCL_IGNORE_SELF_TEST=1 ./john --test=0 --format=oldoffice-opencl
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
drm_intel_gem_bo_context_exec() failed: Invalid argument
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (3, 7, 5)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: warning - disable atomic in L3 feature.
Beignet: Warning - overriding self-test failure
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: Warning - overriding self-test failure
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: oldoffice-opencl, MS Office <= 2003 [MD5/SHA1 RC4 OpenCL]... Build log: /tmp/fxgfQc.cl:224:24: warning: multiple unsequenced modifications to 'p'
/tmp/fxgfQc.cl:441:15: warning: multiple unsequenced modifications to 'p'

FAILED (cmp_all(1))

Failed to release test userptr object! (9) i915 kernel driver may not be sane!

The only statements that may trigger that multiple unsequenced modifications to 'p' warning IMHO are

oldoffice_kernel.cl:194:                        W[i >> 1] = (uint)*p++ | (*p++ << 16U);
oldoffice_kernel.cl:197:                        W[i >> 1] = (uint)*p++ | (*p++ << 16U);
oldoffice_kernel.cl:210:                        W[i >> 1] = (uint)*p++ | (*p++ << 16U);
oldoffice_kernel.cl:224:                        W[i >> 1] = (uint)*p++ | (*p++ << 16U);
oldoffice_kernel.cl:408:                        uint u = *p++ | (*p++ << 16U);
oldoffice_kernel.cl:413:                        uint u = *p++ | (*p++ << 16U);
oldoffice_kernel.cl:426:                        uint u = *p++ | (*p++ << 16U);
oldoffice_kernel.cl:441:                        uint u = *p++ | (*p++ << 16U);

Can any of the above be a problem?

I hope that these warnings

Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!

will disappear after a future libdrm or linux kernel upgrade and that the

drm_intel_gem_bo_context_exec() failed: Invalid argument

problem will be fixed in future beignet versions.

May be when I get rid of the disable atomic in L3 feature due to the failed self test and when I disable the debug CFLAGS from my beignet build, the performance will improve...

For now, I just use OCL_IGNORE_SELF_TEST=1, otherwise the Haswell GPU will be skipped:

$ ./john --list=opencl-devices 
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
drm_intel_gem_bo_context_exec() failed: Invalid argument
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (3, 7, 5)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: warning - disable atomic in L3 feature.
Beignet: disabling non-working device
OpenCL error (CL_DEVICE_NOT_FOUND) in file (common-opencl.c) at line (385) - (No OpenCL device of that type exist)
$ OCL_IGNORE_SELF_TEST=1 ./john --list=opencl-devices             
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
drm_intel_gem_bo_context_exec() failed: Invalid argument
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (3, 7, 5)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: warning - disable atomic in L3 feature.
Beignet: Warning - overriding self-test failure
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: Warning - overriding self-test failure
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Platform #0 name: Intel Gen OCL Driver
Platform version: OpenCL 1.2 beignet 1.1 (git-834d0ae)
    Device #0 (0) name: Intel(R) HD Graphics Haswell GT2 Desktop
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
    Device vendor:      Intel
    Device type:        GPU (LE)
    Device version:     OpenCL 1.2 beignet 1.1 (git-834d0ae)
    Driver version:     1.1
    Native vector widths:   char 8, short 8, int 4, long 2
    Preferred vector width: char 16, short 8, int 4, long 2
    Global Memory:      2.0 GB
    Global Memory Cache:    8.0 KB
    Local Memory:       64.0 KB (Global)
    Max memory alloc. size: 1024.0 MB
    Max clock (MHz):    1000
    Profiling timer res.:   80 ns
    Max Work Group Size:    512
    Parallel compute cores: 20
$ OCL_IGNORE_SELF_TEST=1 ./john --list=opencl-devices --verbosity=5
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
drm_intel_gem_bo_context_exec() failed: Invalid argument
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (3, 7, 5)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: warning - disable atomic in L3 feature.
Beignet: Warning - overriding self-test failure
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: Warning - overriding self-test failure
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Platform #0 name: Intel Gen OCL Driver
Platform version: OpenCL 1.2 beignet 1.1 (git-834d0ae)
    Platform extensions:    cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd
    Device #0 (0) name: Intel(R) HD Graphics Haswell GT2 Desktop
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
    Device vendor:      Intel
    Device type:        GPU (LE)
    Device version:     OpenCL 1.2 beignet 1.1 (git-834d0ae)
    Driver version:     1.1
    Native vector widths:   char 8, short 8, int 4, long 2
    Preferred vector width: char 16, short 8, int 4, long 2
    Global Memory:      2.0 GB
    Device extensions:  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd
    Global Memory Cache:    8.0 KB
    Local Memory:       64.0 KB (Global)
    Max memory alloc. size: 1024.0 MB
    Max clock (MHz):    1000
    Profiling timer res.:   80 ns
    Max Work Group Size:    512
    Parallel compute cores: 20

Most opencl formats pass self test, with c/s rate between 10% and 90% of the CPU (4*OMP). For rar, opencl gets closest:

$ ./john --test --format=rar
Will run 4 OpenMP threads
Benchmarking: rar, RAR3 (4 characters) [SHA1 AES 32/64]... (4xOMP) DONE
Raw:    403 c/s real, 100 c/s virtual

$ OCL_IGNORE_SELF_TEST=1 ./john --test --format=rar-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
drm_intel_gem_bo_context_exec() failed: Invalid argument
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (3, 7, 5)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: warning - disable atomic in L3 feature.
Beignet: Warning - overriding self-test failure
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: Warning - overriding self-test failure
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... (4xOMP) Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DPLAINTEXT_LENGTH=22
Calculating best global worksize (GWS); max. 10s total for crypt_all()
Raw speed figures including buffer transfers:
key xfer: 115.517ms*, len xfer: 115.533ms*, init: 2.800us, loop: 16x52.082ms, final: 19.920us, key xfer: 101.292ms*, iv xfer: 107.760us*
gws:       256         219 c/s    57409536 rounds/s    1.165s per crypt_all()!
key xfer: 199.327ms*, len xfer: 199.329ms*, init: 1.680us, loop: 16x99.704ms, final: 19.520us, key xfer: 199.272ms*, iv xfer: 88.480us*
gws:       512         233 c/s    61079552 rounds/s    2.193s per crypt_all()+
key xfer: 364.582ms*, len xfer: 364.586ms*, init: 2.320us, loop: 16x182.919ms, final: 35.040us, key xfer: 364.498ms*, iv xfer: 73.920us*
gws:      1024         254 c/s    66584576 rounds/s    4.020s per crypt_all()+
key xfer: 1.440us, len xfer: 1.280us, init: 2.800us, loop: 16x349.420ms, final: 36us, key xfer: 1.280us, iv xfer: 1.200us
gws:      2048         366 c/s    95944704 rounds/s    5.590s per crypt_all()+
key xfer: 1.440us, len xfer: 1.120us, init: 5.760us, loop: 16x687.015ms, final: 65.600us, key xfer: 1.120us, iv xfer: 10.575ms*
gws:      4096         372 c/s    97517568 rounds/s   11.002s per crypt_all() - too slow
Local worksize (LWS) 64, global worksize (GWS) 2048
DONE
Raw:    364 c/s real, 102400 c/s virtual

BTW: why is the CPU format using length 4, but the GPU length 5?

And these are all the failing self tests:

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: oldoffice-opencl, MS Office <= 2003 [MD5/SHA1 RC4 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DASCII -DPLAINTEXT_LENGTH=19
Build log: /tmp/DAFnPU.cl:224:24: warning: multiple unsequenced modifications to 'p'
/tmp/DAFnPU.cl:441:15: warning: multiple unsequenced modifications to 'p'

Calculating best global worksize (GWS); max. 1s total for crypt_all()
Raw speed figures including buffer transfers:
xP: 1.746ms*, xI: 1.747ms*, enc: 53.760us, md5+rc4: 1.466ms, xR: 1.543ms*
gws:      1024      156160 c/s      156160 rounds/s   6.557ms per crypt_all()!
xP: 3.068ms*, xI: 3.072ms*, enc: 39.360us, md5+rc4: 2.868ms, xR: 3.004ms*
gws:      2048      169902 c/s      169902 rounds/s  12.054ms per crypt_all()+
xP: 5.818ms*, xI: 5.805ms*, enc: 75.280us, md5+rc4: 5.614ms, xR: 5.729ms*
gws:      4096      177750 c/s      177750 rounds/s  23.043ms per crypt_all()+
xP: 10.606ms*, xI: 10.581ms*, enc: 146.080us, md5+rc4: 10.284ms, xR: 10.468ms*
gws:      8192      194643 c/s      194643 rounds/s  42.087ms per crypt_all()+
xP: 20.173ms*, xI: 20.117ms*, enc: 285.600us, md5+rc4: 19.524ms, xR: 19.923ms*
gws:     16384      204735 c/s      204735 rounds/s  80.025ms per crypt_all()+
xP: 38.848ms*, xI: 38.835ms*, enc: 595.600us, md5+rc4: 37.806ms, xR: 38.401ms*
gws:     32768      212107 c/s      212107 rounds/s 154.487ms per crypt_all()+
xP: 75.759ms*, xI: 75.554ms*, enc: 1.200ms, md5+rc4: 73.755ms, xR: 75.026ms*
gws:     65536      217513 c/s      217513 rounds/s 301.295ms per crypt_all()+
xP: 149.298ms*, xI: 148.891ms*, enc: 2.294ms, md5+rc4: 145.494ms, xR: 147.853ms*
gws:    131072      220722 c/s      220722 rounds/s 593.831ms per crypt_all()+
xP: 1.440us, xI: 298.903ms*, enc: 4.523ms, md5+rc4: 291.022ms, xR: 295.578ms*
gws:    262144      294534 c/s      294534 rounds/s 890.028ms per crypt_all()+
xP: 1.920us, xI: 1.520us, enc: 9.227ms, md5+rc4: 576.541ms, xR: 1.360us
gws:    524288      895035 c/s      895035 rounds/s 585.773ms per crypt_all()!
xP: 2.240us, xI: 1.280us, enc: 18.194ms, md5+rc4: 1.155s, xR: 1.440us
gws:   1048576      893037 c/s      893037 rounds/s    1.174s per crypt_all() - too slow
Local worksize (LWS) 64, global worksize (GWS) 524288
FAILED (cmp_all(1))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: lotus5-opencl, Lotus Notes/Domino 5 [OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
Calculating best global worksize (GWS); max. 1s single kernel invocation.
Raw speed figures including buffer transfers:
xfer: 833.920us*, crypt: 487.520us, xfer: 550.880us*
gws:       256      136728 c/s      136728 rounds/s   1.872ms per crypt_all()!
xfer: 886.800us*, crypt: 752.320us, xfer: 861.200us*
gws:       512      204773 c/s      204773 rounds/s   2.500ms per crypt_all()+
xfer: 1.773ms*, crypt: 1.641ms, xfer: 1.738ms*
gws:      1024      198683 c/s      198683 rounds/s   5.153ms per crypt_all()
xfer: 3.115ms*, crypt: 2.980ms, xfer: 3.087ms*
gws:      2048      223010 c/s      223010 rounds/s   9.183ms per crypt_all()+
xfer: 6.011ms*, crypt: 5.858ms, xfer: 5.957ms*
gws:      4096      229759 c/s      229759 rounds/s  17.827ms per crypt_all()+
xfer: 11.924ms*, crypt: 11.736ms, xfer: 11.839ms*
gws:      8192      230756 c/s      230756 rounds/s  35.500ms per crypt_all()
xfer: 23.686ms*, crypt: 23.426ms, xfer: 23.538ms*
gws:     16384      231898 c/s      231898 rounds/s  70.651ms per crypt_all()
xfer: 46.946ms*, crypt: 46.512ms, xfer: 46.656ms*
gws:     32768      233863 c/s      233863 rounds/s 140.115ms per crypt_all()+
xfer: 1.040us, crypt: 92.609ms, xfer: 92.877ms*
gws:     65536      353316 c/s      353316 rounds/s 185.487ms per crypt_all()+
xfer: 1.360us, crypt: 185.871ms, xfer: 186.244ms*
gws:    131072      352232 c/s      352232 rounds/s 372.117ms per crypt_all()
xfer: 1.360us, crypt: 370.840ms, xfer: 371.800ms*
gws:    262144      352988 c/s      352988 rounds/s 742.642ms per crypt_all()
xfer: 1.040us, crypt: 743.818ms, xfer: 745.505ms*
gws:    524288      352030 c/s      352030 rounds/s    1.489s per crypt_all()
xfer: 2.400us, crypt: 1.484s (exceeds 1s)
Local worksize (LWS) 64, global worksize (GWS) 65536
FAILED (cmp_all(1))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DWORK_GROUP_SIZE=8
Local worksize (LWS) 8, Global worksize (GWS) 1024
Benchmarking: bcrypt-opencl ("$2a$05", 32 iterations) [Blowfish OpenCL]... FAILED (cmp_all(1))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
Local worksize (LWS) 64, Global worksize (GWS) 8192
Benchmarking: descrypt-opencl, traditional crypt(3) [DES OpenCL]... FAILED (cmp_one(0))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: krb5pa-md5-opencl, Kerberos 5 AS-REQ Pre-Auth etype 23 [MD4 HMAC-MD5 RC4 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DASCII -DPLAINTEXT_LENGTH=27
Calculating best global worksize (GWS); max. 200ms single kernel invocation.
Raw speed figures including buffer transfers:
xfer: 599.920us*, init: 35.280us, crypt: 187.360us, xfer: 251.280us*
gws:       256      238396 c/s      238396 rounds/s   1.073ms per crypt_all()!
xfer: 315.920us*, init: 21.360us, crypt: 217.360us, xfer: 285.120us*
gws:       512      609698 c/s      609698 rounds/s 839.760us per crypt_all()!
xfer: 517.840us*, init: 38.720us, crypt: 420.240us, xfer: 487.600us*
gws:      1024      699262 c/s      699262 rounds/s   1.464ms per crypt_all()+
xfer: 1.057ms*, init: 76us, crypt: 852.880us, xfer: 1.022ms*
gws:      2048      680651 c/s      680651 rounds/s   3.008ms per crypt_all()
xfer: 1.868ms*, init: 152.800us, crypt: 1.574ms, xfer: 1.805ms*
gws:      4096      758282 c/s      758282 rounds/s   5.401ms per crypt_all()+
xfer: 3.460ms*, init: 257.680us, crypt: 3.044ms, xfer: 3.368ms*
gws:      8192      808616 c/s      808616 rounds/s  10.130ms per crypt_all()+
xfer: 6.591ms*, init: 538.880us, crypt: 5.843ms, xfer: 6.458ms*
gws:     16384      843086 c/s      843086 rounds/s  19.433ms per crypt_all()+
xfer: 12.555ms*, init: 1.016ms, crypt: 11.198ms, xfer: 12.309ms*
gws:     32768      883728 c/s      883728 rounds/s  37.079ms per crypt_all()+
xfer: 24.328ms*, init: 2.031ms, crypt: 21.744ms, xfer: 23.894ms*
gws:     65536      910238 c/s      910238 rounds/s  71.998ms per crypt_all()+
xfer: 1.200us, init: 4.020ms, crypt: 42.825ms, xfer: 47.095ms*
gws:    131072     1395235 c/s     1395235 rounds/s  93.942ms per crypt_all()+
xfer: 1.040us, init: 7.988ms, crypt: 85.503ms, xfer: 93.441ms*
gws:    262144     1402330 c/s     1402330 rounds/s 186.934ms per crypt_all()
xfer: 2.480us, init: 15.840ms, crypt: 168.811ms, xfer: 1.360us
gws:    524288     2839267 c/s     2839267 rounds/s 184.656ms per crypt_all()!
xfer: 1.760us, init: 32.312ms, crypt: 336.594ms (exceeds 200ms)
Local worksize (LWS) 64, global worksize (GWS) 524288
FAILED (cmp_all(1))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: nt-opencl, NT [MD4 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -D OFFSET_TABLE_SIZE=32 -D HASH_TABLE_SIZE=43 -D SHIFT64_OT_SZ=0 -D SHIFT64_HT_SZ=41 -D NUM_LOADED_HASHES=41 -D NUM_INT_KEYS=1 -D SELECT_CMP_STEPS=4 -D BITMAP_SIZE_BITS_LESS_ONE=32767 -D USE_LOCAL_BITMAPS=1 -D IS_STATIC_GPU_MASK=0 -D CONST_CACHE_SIZE=134217728 -D LOC_0=-1 -D LOC_1=-1 -D LOC_2=-1 -D LOC_3=-1
Self test GWS: 2097152, LWS: 256
FAILED (cmp_all(0))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: PBKDF2-HMAC-SHA512-opencl, GRUB2 / OS X 10.8+, rounds=10000 [PBKDF2-SHA512 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DHASH_LOOPS=250 -DPLAINTEXT_LENGTH=110 -DMAX_SALT_SIZE=107
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: Raw-MD4-opencl [MD4 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -D OFFSET_TABLE_SIZE=13 -D HASH_TABLE_SIZE=15 -D SHIFT64_OT_SZ=3 -D SHIFT64_HT_SZ=1 -D NUM_LOADED_HASHES=14 -D NUM_INT_KEYS=1 -D SELECT_CMP_STEPS=4 -D BITMAP_SIZE_BITS_LESS_ONE=32767 -D USE_LOCAL_BITMAPS=1 -D IS_STATIC_GPU_MASK=0 -D CONST_CACHE_SIZE=134217728 -D LOC_0=-1 -D LOC_1=-1 -D LOC_2=-1 -D LOC_3=-1
Self test GWS: 1048576, LWS: 256
FAILED (cmp_all(0))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: Raw-MD5-opencl [MD5 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -D OFFSET_TABLE_SIZE=11 -D HASH_TABLE_SIZE=5 -D SHIFT64_OT_SZ=5 -D SHIFT64_HT_SZ=1 -D NUM_LOADED_HASHES=5 -D NUM_INT_KEYS=1 -D SELECT_CMP_STEPS=4 -D BITMAP_SIZE_BITS_LESS_ONE=32767 -D USE_LOCAL_BITMAPS=1 -D IS_STATIC_GPU_MASK=0 -D CONST_CACHE_SIZE=134217728 -D LOC_0=-1 -D LOC_1=-1 -D LOC_2=-1 -D LOC_3=-1
Self test GWS: 1048576, LWS: 256
FAILED (cmp_all(0))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: Raw-SHA1-opencl [SHA1 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -D OFFSET_TABLE_SIZE=11 -D HASH_TABLE_SIZE=5 -D SHIFT64_OT_SZ=5 -D SHIFT64_HT_SZ=1 -D SHIFT128_OT_SZ=3 -D SHIFT128_HT_SZ=1 -D NUM_LOADED_HASHES=4 -D NUM_INT_KEYS=1 -D SELECT_CMP_STEPS=4 -D BITMAP_SIZE_BITS_LESS_ONE=32767 -D USE_LOCAL_BITMAPS=1 -D IS_STATIC_GPU_MASK=0 -D CONST_CACHE_SIZE=134217728 -D LOC_0=-1 -D LOC_1=-1 -D LOC_2=-1 -D LOC_3=-1
Self test GWS: 1048576, LWS: 256
FAILED (cmp_all(0))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
Benchmarking: Raw-SHA256-opencl [SHA256 OpenCL]... Calculating best global worksize (GWS); max. 1s total for crypt_all()
Raw speed figures including buffer transfers:
pass xfer: 595.280us*, crypt: 404.160us, result xfer: 464.720us*, index xfer: 618.720us*
gws:      1024      491626 c/s      491626 rounds/s   2.082ms per crypt_all()!
pass xfer: 827.120us*, crypt: 739.440us, result xfer: 785.360us*, index xfer: 829.840us*
gws:      2048      643668 c/s      643668 rounds/s   3.181ms per crypt_all()+
pass xfer: 1.628ms*, crypt: 1.454ms, result xfer: 1.566ms*, index xfer: 1.614ms*
gws:      4096      653920 c/s      653920 rounds/s   6.263ms per crypt_all()+
pass xfer: 3.012ms*, crypt: 2.780ms, result xfer: 2.892ms*, index xfer: 2.955ms*
gws:      8192      703731 c/s      703731 rounds/s  11.640ms per crypt_all()+
pass xfer: 5.786ms*, crypt: 5.491ms, result xfer: 5.542ms*, index xfer: 5.627ms*
gws:     16384      729864 c/s      729864 rounds/s  22.448ms per crypt_all()+
pass xfer: 11.123ms*, crypt: 10.653ms, result xfer: 10.753ms*, index xfer: 10.843ms*
gws:     32768      755478 c/s      755478 rounds/s  43.373ms per crypt_all()+
pass xfer: 21.932ms*, crypt: 21.075ms, result xfer: 21.183ms*, index xfer: 21.338ms*
gws:     65536      766222 c/s      766222 rounds/s  85.531ms per crypt_all()+
pass xfer: 42.630ms*, crypt: 42.106ms, result xfer: 42.217ms*, index xfer: 42.391ms*
gws:    131072      773992 c/s      773992 rounds/s 169.345ms per crypt_all()+
pass xfer: 84.888ms*, crypt: 83.984ms, result xfer: 84.102ms*, index xfer: 84.428ms*
gws:    262144      776943 c/s      776943 rounds/s 337.404ms per crypt_all()
pass xfer: 1.120us, crypt: 166.969ms, result xfer: 167.046ms*, index xfer: 167.826ms*
gws:    524288     1044726 c/s     1044726 rounds/s 501.842ms per crypt_all()+
pass xfer: 338.504ms*, crypt: 335.524ms, result xfer: 335.601ms*, index xfer: 336.525ms*
gws:   1048576      778940 c/s      778940 rounds/s    1.346s per crypt_all() - too slow
Max local worksize 512, Local worksize (LWS) 32, global worksize (GWS) 524288
FAILED (cmp_all(0))

initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL 4x]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DHASH_LOOPS=105 -DITERATIONS=4095 -DPLAINTEXT_LENGTH=63 -DV_WIDTH=4
ASSERTION FAILED: !(ctx->getErrCode() == OUT_OF_RANGE_IF_ENDIF && ctx->getIFENDIFFix())
  at file /home/fd/git/beignet/backend/src/backend/gen_program.cpp, function virtual gbe::Kernel* gbe::GenProgram::compileKernel(const gbe::ir::Unit&, const string&, bool), line 196
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438

The common warnings

drm_intel_gem_bo_context_exec() failed: Invalid argument
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (3, 7, 5)
See README.md or http://www.freedesktop.org/wiki/Software/Beignet/
Beignet: warning - disable atomic in L3 feature.
Beignet: Warning - overriding self-test failure

which appeared for every single test have been filtered using grep -v.

@magnum I wasn't sure whether I should create separate issues for every possible problem (failing format self tests etc.)

May be beignet, libdrm or the i915 driver are to blame for many of these. If you think I should have created multiple issues, I suggest that we keep this one for the multiple unsequenced modifications to 'p' warning in oldoffice-opencl (and may be the failing oldoffice-opencl self test), and I'll create new issues for the other failing self tests.

frank-dittrich commented 9 years ago

Apparently, the i915 driver problems will be fixed with the 4.2 linux kernel.

magnumripper commented 9 years ago

Here, I am not sure whether the multiple unsequenced modifications to 'p' warning indicates a real problem:

I guess we should trust it until proven wrong.

The only statements that may trigger that multiple unsequenced modifications to 'p' warning IMHO are

oldoffice_kernel.cl:194:        W[i >> 1] = (uint)*p++ | (*p++ << 16U);
(...)

Can any of the above be a problem?

I guess. I'm not sure what a C99 spec would say about it (I thought it was left-to-right when I wrote that). Please try adding temps that removes these ambiguities! Or maybe we should google for C specs first (or just ask Alexander Cherepanov).

BTW: why is the CPU format using length 4, but the GPU length 5?

That's for benchmarking against cRARk, which has/had it that way. I don't care about that anymore, but it took me a year or so to get on par with cRARk.

@magnum I wasn't sure whether I should create separate issues for every possible problem (failing format self tests etc.)

Perhaps not quite yet. First we should try to find general little fixes that solves many issues (like was done for OSX three years ago - the first results were worse than these)

frank-dittrich commented 9 years ago

Upgrading the Linux kernel from 4.0.8-300.fc22.x86_64 to a vanilla 4.2 rc3 helped to get rid of the OCL_IGNORE_SELF_TEST=1 which was needed to convince beignet not to disable the "non-working" device which didn't pass beignet's self test.

Just upgrading the linux kernel also helped improving the rar-opencl performance (from 364 c/s to 658 c/s).

I guess it is somehow related to the Beignet: warning - disable atomic in L3 feature. which was due to the failing self test with the older kernel.

$ ./john --test --format=rar-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... (4xOMP) Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DPLAINTEXT_LENGTH=22
Calculating best global worksize (GWS); max. 10s total for crypt_all()
Raw speed figures including buffer transfers:
key xfer: 82.733ms*, len xfer: 82.750ms*, init: 2.800us, loop: 16x35.801ms, final: 34.480us, key xfer: 69.059ms*, iv xfer: 111.680us*
gws:       256         317 c/s    83099648 rounds/s 807.510ms per crypt_all()!
key xfer: 132.540ms*, len xfer: 132.549ms*, init: 1.680us, loop: 16x66.152ms, final: 20.160us, key xfer: 132.499ms*, iv xfer: 109.280us*
gws:       512         351 c/s    92012544 rounds/s    1.456s per crypt_all()+
key xfer: 220.683ms*, len xfer: 220.719ms*, init: 1.920us, loop: 16x110.336ms, final: 34.800us, key xfer: 220.567ms*, iv xfer: 130.480us*
gws:      1024         421 c/s   110362624 rounds/s    2.427s per crypt_all()+
key xfer: 396.616ms*, len xfer: 396.585ms*, init: 2.560us, loop: 16x198.144ms, final: 45.040us, key xfer: 396.396ms*, iv xfer: 73.360us*
gws:      2048         469 c/s   122945536 rounds/s    4.360s per crypt_all()+
key xfer: 1.520us, len xfer: 1.360us, init: 3.680us, loop: 16x385.641ms, final: 70.240us, key xfer: 2us, iv xfer: 1.120us
gws:      4096         663 c/s   173801472 rounds/s    6.170s per crypt_all()+
key xfer: 1.525s*, len xfer: 1.525s*, init: 6.800us, loop: 16x762.559ms, final: 129.200us, key xfer: 1.532s*, iv xfer: 7.142ms*
gws:      8192         487 c/s   127664128 rounds/s   16.791s per crypt_all() - too slow
Local worksize (LWS) 64, global worksize (GWS) 4096
DONE
Raw:    658 c/s real, 204800 c/s virtual
frank-dittrich commented 9 years ago

For some reason, the encfs-opencl self test takes more than 7 minutes:

$ time ./john --test --format=encfs-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: encfs-opencl, EncFS [PBKDF2-SHA1 OpenCL 4x AES/Blowfish]... (4xOMP) Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DHASH_LOOPS=753 -DOUTLEN=48 -DPLAINTEXT_LENGTH=64 -DV_WIDTH=4
Calculating best global worksize (GWS); max. 10s total for crypt_all()
Raw speed figures including buffer transfers:
P xfer: 170.097ms*, init: 752.960us, loop: 241x84.120ms, final: 201.840us, res xfer: 168.742ms*
gws:       128          24 c/s     8710800 rounds/s   20.613s per crypt_all()!
P xfer: 174.726ms*, init: 412.080us, loop: 241x86.903ms, final: 229.680us, res xfer: 174.614ms*
gws:       256          48 c/s    17421600 rounds/s   21.294s per crypt_all()+
P xfer: 181.656ms*, init: 538.800us, loop: 241x90.215ms, final: 263.760us, res xfer: 181.576ms*
gws:       512          92 c/s    33391400 rounds/s   22.106s per crypt_all()+
P xfer: 308.815ms*, init: 1.320ms, loop: 241x152.363ms, final: 774us, res xfer: 308.409ms*
gws:      1024         109 c/s    39561550 rounds/s   37.339s per crypt_all() - too slow
Local worksize (LWS) 64, global worksize (GWS) 512
DONE
Speed for cost 1 (iteration count) of 181474 and 181317
Raw:    47.1 c/s real, 5535 c/s virtual

real    7m33.911s
user    0m2.681s
sys 0m4.626s
Sat Jul 25 18:52:44 CEST 2015
frank-dittrich commented 9 years ago

With a 4.2-rc3 linux kernel, the number of failing opencl formats reduced from 16 to 6:

$ for f in `grep -v "^0 " opencl_errors-4.2.txt |cut -d" " -f 2-`; do time ./john --test --format=$f --verbosity=5; done
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)

real    0m2.114s
user    0m1.414s
sys 0m0.054s
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)

real    0m3.321s
user    0m3.267s
sys 0m0.041s
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: PBKDF2-HMAC-SHA512-opencl, GRUB2 / OS X 10.8+, rounds=10000 [PBKDF2-SHA512 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DHASH_LOOPS=250 -DPLAINTEXT_LENGTH=110 -DMAX_SALT_SIZE=107
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)

real    0m3.161s
user    0m3.103s
sys 0m0.048s
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)

real    0m0.236s
user    0m0.217s
sys 0m0.008s
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL 4x]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER -DHASH_LOOPS=105 -DITERATIONS=4095 -DPLAINTEXT_LENGTH=63 -DV_WIDTH=4
ASSERTION FAILED: !(ctx->getErrCode() == OUT_OF_RANGE_IF_ENDIF && ctx->getIFENDIFFix())
  at file /home/fd/git/beignet/backend/src/backend/gen_program.cpp, function virtual gbe::Kernel* gbe::GenProgram::compileKernel(const gbe::ir::Unit&, const string&, bool), line 196
Trace/breakpoint trap (core dumped)

real    0m15.782s
user    0m13.106s
sys 0m0.146s
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=1 -D_OPENCL_COMPILER 
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)

real    0m0.243s
user    0m0.225s
sys 0m0.008s

To get an idea why the beignet assertions failed, I added a few fprintf statements:

(master)beignet $ git diff
diff --git a/backend/src/backend/gen_context.cpp b/backend/src/backend/gen_context.cpp
index e16b0a9..183d803 100644
--- a/backend/src/backend/gen_context.cpp
+++ b/backend/src/backend/gen_context.cpp
@@ -435,7 +435,7 @@ namespace gbe

             p->MOV(dst, tmp);
           } else {
-            GBE_ASSERT(0);
+            fprintf(stderr,"src.type %d\n", src.type); GBE_ASSERT(0);
           }
         }
       }
diff --git a/backend/src/backend/gen_program.cpp b/backend/src/backend/gen_program.cpp
index c761a2f..27b979c 100644
--- a/backend/src/backend/gen_program.cpp
+++ b/backend/src/backend/gen_program.cpp
@@ -192,8 +192,10 @@ namespace gbe {
       if ( ctx->getErrCode() == OUT_OF_RANGE_IF_ENDIF && !ctx->getIFENDIFFix() ) {
         ctx->setIFENDIFFix(true);
         codeGen--;
-      } else
+      } else {
+        fprintf(stderr, "ctx->getErrCode() %d, ctx->getIFENDIFFix() %d, OUT_OF_RANGE_IF_ENDIF %d\n", ctx->getErrCode(), ctx->getIFENDIFFix(), OUT_OF_RANGE_IF_ENDIF);
         GBE_ASSERT(!(ctx->getErrCode() == OUT_OF_RANGE_IF_ENDIF && ctx->getIFENDIFFix()));
+      }
     }

     GBE_ASSERTM(kernel != NULL, "Fail to compile kernel, may need to increase reserved registers for spilling.");

and got:

(bleeding-jumbo)run $ for f in `grep -v "^0 " opencl_errors-4.2.txt |cut -d" " -f 2-`; do ./john --test --format=$f --devices=0; done
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
src.type 8
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
src.type 8
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: PBKDF2-HMAC-SHA512-opencl, GRUB2 / OS X 10.8+, rounds=10000 [PBKDF2-SHA512 OpenCL]... ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
src.type 8
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
src.type 8
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL 4x]... ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
ctx->getErrCode() 4, ctx->getIFENDIFFix() 1, OUT_OF_RANGE_IF_ENDIF 4
ASSERTION FAILED: !(ctx->getErrCode() == OUT_OF_RANGE_IF_ENDIF && ctx->getIFENDIFFix())
  at file /home/fd/git/beignet/backend/src/backend/gen_program.cpp, function virtual gbe::Kernel* gbe::GenProgram::compileKernel(const gbe::ir::Unit&, const string&, bool), line 197
Trace/breakpoint trap (core dumped)
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
ctx->getErrCode() 1, ctx->getIFENDIFFix() 0, OUT_OF_RANGE_IF_ENDIF 4
src.type 8
ASSERTION FAILED: 0
  at file /home/fd/git/beignet/backend/src/backend/gen_context.cpp, function virtual void gbe::GenContext::emitUnaryWithTempInstruction(const gbe::SelectionInstruction&), line 438
Trace/breakpoint trap (core dumped)

Looks like beignet is to blame here.

frank-dittrich commented 9 years ago

BTW: not only is the GPU faster than the CPU for rar (660 c/s vs. 400 c/s), I can even run them in parallel, and after GPU auto-tuning is finished, both run at about the same speed they would run when started separately, i.e., I get a combined speed of 1060 c/s for my Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz

frank-dittrich commented 9 years ago

These are the rar-opencl format tests of ./jtrts.pl:

form=rar-opencl                   guesses:  130 0:00:02:21 DONE  [PASSED]
.pot CHK:rar-opencl               guesses:  130 0:00:01:36 DONE  [PASSED] (130 val-pwd)

form=rar-opencl                   guesses:  216 -show= 216 0:00:05:24 DONE : Expected count(s) (297)  [!!!FAILED1!!!]
.pot CHK:rar-opencl               guesses:  216 0:00:02:40 DONE  [PASSED] (216 val-pwd)

Several other opencl formats pass self test (will report in detail later, probably next week).

frank-dittrich commented 9 years ago

This is the jtrts result:

$ (for f in `../run/john --list=formats --format=opencl|tr "," " "`; do echo -n "$f "; date; ./jtrts.pl -q $f -passthru="--skip-self-test"; done; date ) 2> /dev/null |grep -v "Failed to release test userptr object"
Sun Jul 26 00:38:36 CEST 2015
sha1crypt-opencl Sun Jul 26 00:38:36 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 50 seconds          
oldoffice-opencl Sun Jul 26 00:39:26 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 8 seconds
PBKDF2-HMAC-MD4-opencl Sun Jul 26 00:39:34 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 6 seconds
PBKDF2-HMAC-MD5-opencl Sun Jul 26 00:39:40 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 7 seconds
PBKDF2-HMAC-SHA1-opencl Sun Jul 26 00:39:47 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 3 tests.  Time used was 733 seconds          
rar-opencl Sun Jul 26 00:52:00 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=rar-opencl                   guesses:  216 -show= 216 0:00:05:24 DONE : Expected count(s) (297)  [!!!FAILED1!!!]
Some tests had Errors. Performed 2 tests.  1 errors                                    
Time used was 734 seconds
RAR5-opencl Sun Jul 26 01:04:14 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 8 seconds
lotus5-opencl Sun Jul 26 01:04:22 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 8 seconds            
agilekeychain-opencl Sun Jul 26 01:04:30 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 6 seconds
bcrypt-opencl Sun Jul 26 01:04:36 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=Bcrypt-opencl                guesses:  300 -show= 300 0:00:00:02 DONE : Expected count(s) (1500)  [!!!FAILED1!!!]
form=Bcrypt_broken-opencl         guesses:  300 -show= 300 0:00:00:02 DONE : Expected count(s) (1500)  [!!!FAILED1!!!]
Some tests had Errors. Performed 2 tests.  2 errors                                    
Time used was 26 seconds
blockchain-opencl Sun Jul 26 01:05:02 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 8 seconds
md5crypt-opencl Sun Jul 26 01:05:10 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 40 seconds           
sha256crypt-opencl Sun Jul 26 01:05:50 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 3 tests.  Time used was 253 seconds          
sha512crypt-opencl Sun Jul 26 01:10:03 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=sha512crypt-opencl           guesses:    0 -show=   0 unk unk : Expected count(s) (1389)(1500)  [!!!FAILED2!!! exited, return code 133]
.pot CHK:sha512crypt-opencl       guesses:    0 -show=     unk unk : Expected count(s) (1389)(1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
form=sha512crypt-opencl-new       guesses:    0 -show=   0 unk unk : Expected count(s) (760)  [!!!FAILED2!!! exited, return code 133]
.pot CHK:sha512crypt-opencl-new   guesses:    0 -show=     unk unk : Expected count(s) (760)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
form=sha512crypt-opencl-xsalt     guesses:    0 -show=   0 unk unk : Expected count(s) (1168)  [!!!FAILED2!!! exited, return code 133]
.pot CHK:sha512crypt-opencl-xsalt guesses:    0 -show=     unk unk : Expected count(s) (1168)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 3 tests.  3 errors  3 errors reprocessing the .POT files  3 runs had non-clean exit
Time used was 21 seconds
descrypt-opencl Sun Jul 26 01:10:24 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 35 seconds           
dmg-opencl Sun Jul 26 01:10:59 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 8 seconds
encfs-opencl Sun Jul 26 01:11:07 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 6 seconds
gpg-opencl Sun Jul 26 01:11:13 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 7 seconds
keychain-opencl Sun Jul 26 01:11:20 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 128 seconds          
keyring-opencl Sun Jul 26 01:13:28 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 9 seconds
krb5pa-md5-opencl Sun Jul 26 01:13:37 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=krb5pa-md5-opencl            guesses:    0 -show=   0 0:00:00:21 DONE : Expected count(s) (1500)  [!!!FAILED1!!!]
.pot CHK:krb5pa-md5-opencl        guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files
Time used was 28 seconds
krb5pa-sha1-opencl Sun Jul 26 01:14:05 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 9 seconds
LM-opencl Sun Jul 26 01:14:14 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 6 seconds
mscash-opencl Sun Jul 26 01:14:20 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=mscash-opencl                guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED2!!! exited, return code 1]
.pot CHK:mscash-opencl            guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
form=mscash-opencl-utf8           guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED2!!! exited, return code 1]
.pot CHK:mscash-opencl-utf8       guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 2 tests.  2 errors  2 errors reprocessing the .POT files  2 runs had non-clean exit
Time used was 7 seconds
mscash2-opencl Sun Jul 26 01:14:27 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=mscash2-opencl               guesses: 1050 -show=1050 0:00:00:05 DONE : Expected count(s) (1410)(1500)  [!!!FAILED1!!!]
Some tests had Errors. Performed 2 tests.  1 errors                                     
Time used was 34 seconds
mysql-sha1-opencl Sun Jul 26 01:15:01 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 10 seconds           
ssha-opencl Sun Jul 26 01:15:11 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 11 seconds           
nt-opencl Sun Jul 26 01:15:22 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=nt-opencl                    guesses:    0 -show=   0 unk unk : Expected count(s) (1478)  [!!!FAILED2!!! exited, return code 1]
.pot CHK:nt-opencl                guesses:    0 -show=   0 unk unk : Expected count(s) (1478)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files  1 runs had non-clean exit
Time used was 9 seconds
ntlmv2-opencl Sun Jul 26 01:15:31 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=ntlmv2-opencl                guesses:    0 -show=   0 0:00:00:06 DONE : Expected count(s) (1452)(1500)  [!!!FAILED1!!!]
.pot CHK:ntlmv2-opencl            guesses:    0 -show=   0 unk unk : Expected count(s) (1452)(1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
form=netntlmv2-opencl-utf8        guesses:    0 -show=   0 0:00:00:02 DONE : Expected count(s) (1442)(1500)  [!!!FAILED1!!!]
.pot CHK:netntlmv2-opencl-utf8    guesses:    0 -show=   0 unk unk : Expected count(s) (1442)(1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
.pot CHK:netntlmv2-opencl-koi8r   guesses:    0 -show=   0 0:00:00:01 DONE : Expected count(s) (1429)(1500)  [!!!FAILED4!!!]  (0 val-pwd  0 inval-pwd)
Some tests had Errors. Performed 3 tests.  2 errors  3 errors reprocessing the .POT files
Time used was 25 seconds
o5logon-opencl Sun Jul 26 01:15:56 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
.pot CHK:o5logon-opencl           guesses: 1499 -show=1499 0:00:00:01 DONE : Expected count(s) (1500)  [!!!FAILED4!!!]  (1498 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors reprocessing the .POT files
Time used was 13 seconds
ODF-opencl Sun Jul 26 01:16:09 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 2 tests.  Time used was 18 seconds           
ODF-AES-opencl Sun Jul 26 01:16:27 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 15 seconds           
office2007-opencl Sun Jul 26 01:16:42 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=office_2007-opencl           guesses: 1190 -show=1190 0:00:00:12 DONE : Expected count(s) (1500)  [!!!FAILED1!!!]
Some tests had Errors. Performed 1 tests.  1 errors                                     
Time used was 35 seconds
office2010-opencl Sun Jul 26 01:17:17 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=office_2010-opencl           guesses:   32 -show=  32 0:00:01:55 DONE : Expected count(s) (48)  [!!!FAILED1!!!]
.pot CHK:office_2010-opencl       guesses:   32 -show=  32 0:00:01:17 DONE : Expected count(s) (48)  [!!!FAILED4!!!]  (30 val-pwd  2 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files
Time used was 205 seconds
office2013-opencl Sun Jul 26 01:20:42 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=office_2013-opencl           guesses:    0 -show=   0 unk unk : Expected count(s) (48)  [!!!FAILED2!!! exited, return code 133]
.pot CHK:office_2013-opencl       guesses:    0 -show=   0 unk unk : Expected count(s) (48)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files  1 runs had non-clean exit
Time used was 12 seconds
PBKDF2-HMAC-SHA256-opencl Sun Jul 26 01:20:54 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 2 tests.  Time used was 64 seconds           
PBKDF2-HMAC-SHA512-opencl Sun Jul 26 01:21:58 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=pbkdf2-hmac-sha512-opencl    guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED2!!! exited, return code 133]
.pot CHK:pbkdf2-hmac-sha512-openc guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files  1 runs had non-clean exit
Time used was 12 seconds
phpass-opencl Sun Jul 26 01:22:10 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 194 seconds          
pwsafe-opencl Sun Jul 26 01:25:24 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 69 seconds           
RAKP-opencl Sun Jul 26 01:26:33 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 55 seconds           
Raw-MD4-opencl Sun Jul 26 01:27:28 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=raw-md4-opencl               guesses:    0 -show= 172 unk unk : Expected count(s) (1500)  [!!!FAILED2!!! exited, return code 1]
.pot CHK:raw-md4-opencl           guesses:    0 unk unk  [pass, but exited, return code 139]
Some tests had Errors. Performed 1 tests.  1 errors  2 runs had non-clean exit
Time used was 10 seconds
Raw-MD5-opencl Sun Jul 26 01:27:38 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=raw-md5-opencl               guesses:    0 -show= 172 unk unk : Expected count(s) (1500)  [!!!FAILED2!!! exited, return code 1]
.pot CHK:raw-md5-opencl           guesses:    0 -show=  64 unk unk : Expected count(s) (1500)  [!!!FAILED4!!!]  (64 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files  1 runs had non-clean exit
Time used was 9 seconds
Raw-SHA1-opencl Sun Jul 26 01:27:47 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=raw-sha1-opencl              guesses:    0 -show= 172 unk unk : Expected count(s) (1500)  [!!!FAILED2!!! exited, return code 1]
.pot CHK:raw-sha1-opencl          guesses:    0 -show=  64 unk unk : Expected count(s) (1500)  [!!!FAILED4!!!]  (64 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files  1 runs had non-clean exit
Time used was 8 seconds
Raw-SHA256-opencl Sun Jul 26 01:27:55 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 11 seconds           
Raw-SHA512-opencl Sun Jul 26 01:28:06 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=raw-sha512-opencl            guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED2!!! exited, return code 133]
.pot CHK:raw-sha512-opencl        guesses:    0 -show=     unk unk : Expected count(s) (1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files  1 runs had non-clean exit
Time used was 7 seconds
7z-opencl Sun Jul 26 01:28:13 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 6 seconds
strip-opencl Sun Jul 26 01:28:19 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 1 tests.  Time used was 476 seconds          
sxc-opencl Sun Jul 26 01:36:15 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
NO tests were performed.  Time used was 8 seconds
wpapsk-opencl Sun Jul 26 01:36:23 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=wpapsk-opencl                guesses:    0 -show=   0 unk unk : Expected count(s) (500)  [!!!FAILED2!!! exited, return code 133]
.pot CHK:wpapsk-opencl            guesses:    0 -show=   0 unk unk : Expected count(s) (500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files  1 runs had non-clean exit
Time used was 23 seconds
XSHA512-opencl Sun Jul 26 01:36:46 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
form=xsha512-opencl               guesses:    0 -show=   0 unk unk : Expected count(s) (1500)  [!!!FAILED2!!! exited, return code 133]
.pot CHK:xsha512-opencl           guesses:    0 -show=     unk unk : Expected count(s) (1500)  [!!!FAILED4!!!]  (0 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 1 tests.  1 errors  1 errors reprocessing the .POT files  1 runs had non-clean exit
Time used was 7 seconds
zip-opencl Sun Jul 26 01:36:53 CEST 2015
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-531-g1b0031f OMP [linux-gnu 64-bit AVX2-ac]
--------------------------------------------------------------------------------
All tests passed without error.  Performed 4 tests.  Time used was 720 seconds          
Sun Jul 26 01:48:53 CEST 2015
frank-dittrich commented 9 years ago

And these are the --test results:

$ (for f in `../run/john --list=formats --format=opencl|tr "," " "`; do echo -n "$f "; date; ./john --test --format=$f | tee test-1-$f.txt; done ; date ) 2> /dev/null |grep -v "Failed to release test user"
sha1crypt-opencl Sun Jul 26 00:20:48 CEST 2015
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL 4x]... DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw:    469 c/s real, 51200 c/s virtual

oldoffice-opencl Sun Jul 26 00:21:28 CEST 2015
Benchmarking: oldoffice-opencl, MS Office <= 2003 [MD5/SHA1 RC4 OpenCL]... DONE
Speed for cost 1 (hash type) of 1 and 0
Many salts: 1542K c/s real, 45875K c/s virtual
Only one salt:  1497K c/s real, 31457K c/s virtual

PBKDF2-HMAC-MD4-opencl Sun Jul 26 00:21:33 CEST 2015
Benchmarking: PBKDF2-HMAC-MD4-opencl [PBKDF2-MD4 OpenCL 4x]... DONE
Speed for cost 1 (iterations) of 1000
Raw:    113975 c/s real, 13107K c/s virtual

PBKDF2-HMAC-MD5-opencl Sun Jul 26 00:21:51 CEST 2015
Benchmarking: PBKDF2-HMAC-MD5-opencl [PBKDF2-MD5 OpenCL 4x]... DONE
Speed for cost 1 (iterations) of 1000
Raw:    74898 c/s real, 26214K c/s virtual

PBKDF2-HMAC-SHA1-opencl Sun Jul 26 00:22:09 CEST 2015
Benchmarking: PBKDF2-HMAC-SHA1-opencl [PBKDF2-SHA1 OpenCL 4x]... DONE
Speed for cost 1 (iterations) of 1000
Raw:    29520 c/s real, 13107K c/s virtual

rar-opencl Sun Jul 26 00:22:37 CEST 2015
Benchmarking: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... (4xOMP) DONE
Raw:    661 c/s real, 136533 c/s virtual

RAR5-opencl Sun Jul 26 00:23:07 CEST 2015
Benchmarking: RAR5-opencl [PBKDF2-SHA256 OpenCL]... DONE
Speed for cost 1 (iteration count) of 32768
Raw:    408 c/s real, 136533 c/s virtual

lotus5-opencl Sun Jul 26 00:23:36 CEST 2015
Benchmarking: lotus5-opencl, Lotus Notes/Domino 5 [OpenCL]... DONE
Raw:    865569 c/s real, 11468K c/s virtual

agilekeychain-opencl Sun Jul 26 00:23:40 CEST 2015
Benchmarking: agilekeychain-opencl, 1Password Agile Keychain [PBKDF2-SHA1 OpenCL AES]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 1000
Raw:    31207 c/s real, 409600 c/s virtual

bcrypt-opencl Sun Jul 26 00:23:46 CEST 2015
Benchmarking: bcrypt-opencl ("$2a$05", 32 iterations) [Blowfish OpenCL]... DONE
Speed for cost 1 (iteration count) of 32
Raw:    322 c/s real, 12800 c/s virtual

blockchain-opencl Sun Jul 26 00:23:51 CEST 2015
Benchmarking: blockchain-opencl, blockchain My Wallet [PBKDF2-SHA1 OpenCL AES]... (4xOMP) DONE
Raw:    674635 c/s real, 395475 c/s virtual

md5crypt-opencl Sun Jul 26 00:23:56 CEST 2015
Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE
Raw:    109226 c/s real, 11468K c/s virtual

sha256crypt-opencl Sun Jul 26 00:23:59 CEST 2015
Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... DONE
Speed for cost 1 (iteration count) of 5000
Raw:    687 c/s real, 34133 c/s virtual

sha512crypt-opencl Sun Jul 26 00:24:14 CEST 2015
descrypt-opencl Sun Jul 26 00:24:16 CEST 2015
Benchmarking: descrypt-opencl, traditional crypt(3) [DES OpenCL]... DONE
Many salts: 1055K c/s real, 157286K c/s virtual
Only one salt:  1038K c/s real, 104857K c/s virtual

dmg-opencl Sun Jul 26 00:24:21 CEST 2015
Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 OpenCL 3DES/AES]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 1000
Raw:    11592 c/s real, 12934 c/s virtual

encfs-opencl Sun Jul 26 00:24:27 CEST 2015
Benchmarking: encfs-opencl, EncFS [PBKDF2-SHA1 OpenCL 4x AES/Blowfish]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 181474 and 181317
Raw:    47.2 c/s real, 5851 c/s virtual

gpg-opencl Sun Jul 26 00:32:01 CEST 2015
Benchmarking: gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1 OpenCL]... (4xOMP) DONE
Speed for cost 1 (s2k-count) of 65536, cost 2 (hash algorithm [1:MD5 2:SHA1 3:RIPEMD160 8:SHA256 9:SHA384 10:SHA512 11:SHA224]) of 2, cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256]) of 3
Raw:    43497 c/s real, 132843 c/s virtual

keychain-opencl Sun Jul 26 00:32:11 CEST 2015
Benchmarking: keychain-opencl, Mac OS X Keychain [PBKDF2-SHA1 OpenCL 3DES]... (4xOMP) DONE
Raw:    14894 c/s real, 102400 c/s virtual

keyring-opencl Sun Jul 26 00:32:16 CEST 2015
Benchmarking: keyring-opencl, GNOME Keyring [SHA256 OpenCL AES]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 3221 and 2439
Raw:    3757 c/s real, 27306 c/s virtual

krb5pa-md5-opencl Sun Jul 26 00:32:22 CEST 2015
Benchmarking: krb5pa-md5-opencl, Kerberos 5 AS-REQ Pre-Auth etype 23 [MD4 HMAC-MD5 RC4 OpenCL]... DONE
Many salts: 2746K c/s real, 72089K c/s virtual
Only one salt:  2427K c/s real, 23831K c/s virtual

krb5pa-sha1-opencl Sun Jul 26 00:32:25 CEST 2015
Benchmarking: krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18 [PBKDF2-SHA1 OpenCL 4x]... (4xOMP) DONE
Raw:    3296 c/s real, 126030 c/s virtual

LM-opencl Sun Jul 26 00:32:46 CEST 2015
Benchmarking: LM-opencl [LM BS OpenCL(inefficient)]... DONE
Raw:    13865K c/s real, 167772K c/s virtual

mscash-opencl Sun Jul 26 00:32:53 CEST 2015
Benchmarking: mscash-opencl, M$ Cache Hash [MD4 OpenCL]... DONE
Raw:    7557K c/s real, 419430K c/s virtual

mscash2-opencl Sun Jul 26 00:32:55 CEST 2015
Benchmarking: mscash2-opencl, MS Cache Hash 2 (DCC2) [PBKDF2-SHA1 OpenCL]... DONE
Raw:    3068 c/s real, 3079 c/s virtual

mysql-sha1-opencl Sun Jul 26 00:33:23 CEST 2015
Benchmarking: mysql-sha1-opencl, MySQL 4.1+ [SHA1 OpenCL (inefficient, development use only)]... DONE
Raw:    5041K c/s real, 47662K c/s virtual

ssha-opencl Sun Jul 26 00:33:25 CEST 2015
Benchmarking: ssha-opencl, Netscape LDAP {SSHA} [SHA1 OpenCL (inefficient, development use mostly)]... DONE
Many salts: 9460K c/s real, 139810K c/s virtual
Only one salt:  8388K c/s real, 48395K c/s virtual

nt-opencl Sun Jul 26 00:33:31 CEST 2015
Benchmarking: nt-opencl, NT [MD4 OpenCL]... DONE
Raw:    10847K c/s real, 89877K c/s virtual

ntlmv2-opencl Sun Jul 26 00:33:34 CEST 2015
Benchmarking: ntlmv2-opencl, NTLMv2 C/R [MD4 HMAC-MD5 OpenCL 4x]... DONE
Many salts: 14979K c/s real, 196608K c/s virtual
Only one salt:  9799K c/s real, 49932K c/s virtual

o5logon-opencl Sun Jul 26 00:33:39 CEST 2015
Benchmarking: o5logon-opencl, Oracle O5LOGON protocol [SHA1 OpenCL AES 32/64]... DONE
Raw:    2746K c/s real, 4179K c/s virtual

ODF-opencl Sun Jul 26 00:33:42 CEST 2015
Benchmarking: ODF-opencl [SHA1 OpenCL Blowfish]... (4xOMP) DONE
Raw:    18618 c/s real, 12800 c/s virtual

ODF-AES-opencl Sun Jul 26 00:33:48 CEST 2015
Benchmarking: ODF-AES-opencl [SHA256 OpenCL AES]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 1
Raw:    12047 c/s real, 21557 c/s virtual

office2007-opencl Sun Jul 26 00:33:53 CEST 2015
Benchmarking: office2007-opencl, MS Office 2007 (50,000 iterations) [SHA1 OpenCL 4x AES]... (4xOMP) DONE
Raw:    1110 c/s real, 81920 c/s virtual

office2010-opencl Sun Jul 26 00:34:38 CEST 2015
Benchmarking: office2010-opencl, MS Office 2010 (100,000 iterations) [SHA1 OpenCL 4x AES]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 100000
Raw:    519 c/s real, 24094 c/s virtual

office2013-opencl Sun Jul 26 00:35:27 CEST 2015
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) PBKDF2-HMAC-SHA256-opencl Sun Jul 26 00:35:32 CEST 2015
Benchmarking: PBKDF2-HMAC-SHA256-opencl, rounds=12000 [PBKDF2-SHA256 OpenCL]... DONE
Speed for cost 1 (iteration count) of 12000
Raw:    1116 c/s real, 409600 c/s virtual

PBKDF2-HMAC-SHA512-opencl Sun Jul 26 00:36:03 CEST 2015
Benchmarking: PBKDF2-HMAC-SHA512-opencl, GRUB2 / OS X 10.8+, rounds=10000 [PBKDF2-SHA512 OpenCL]... phpass-opencl Sun Jul 26 00:36:07 CEST 2015
Benchmarking: phpass-opencl ($P$9 lengths 0 to 15) [MD5 OpenCL]... DONE
Raw:    56219 c/s real, 5734K c/s virtual

pwsafe-opencl Sun Jul 26 00:36:10 CEST 2015
Benchmarking: pwsafe-opencl, Password Safe [SHA256 OpenCL]... DONE
Speed for cost 1 (iteration count) of 2048
Raw:    13212 c/s real, 819200 c/s virtual

RAKP-opencl Sun Jul 26 00:36:17 CEST 2015
Benchmarking: RAKP-opencl, IPMI 2.0 RAKP (RMCP+) [HMAC-SHA1 OpenCL 4x]... DONE
Many salts: 10586K c/s real, 110100K c/s virtual
Only one salt:  8824K c/s real, 38751K c/s virtual

Raw-MD4-opencl Sun Jul 26 00:36:42 CEST 2015
Benchmarking: Raw-MD4-opencl [MD4 OpenCL]... DONE
Raw:    4802K c/s real, 78643K c/s virtual

Raw-MD5-opencl Sun Jul 26 00:36:44 CEST 2015
Benchmarking: Raw-MD5-opencl [MD5 OpenCL]... DONE
Raw:    4660K c/s real, 62914K c/s virtual

Raw-SHA1-opencl Sun Jul 26 00:36:46 CEST 2015
Benchmarking: Raw-SHA1-opencl [SHA1 OpenCL]... DONE
Raw:    4481K c/s real, 65536K c/s virtual

Raw-SHA256-opencl Sun Jul 26 00:36:47 CEST 2015
Benchmarking: Raw-SHA256-opencl [SHA256 OpenCL]... DONE
Raw:    5969K c/s real, 54811K c/s virtual

Raw-SHA512-opencl Sun Jul 26 00:36:51 CEST 2015
7z-opencl Sun Jul 26 00:36:51 CEST 2015
Benchmarking: 7z-opencl, 7-Zip (512K iterations) [SHA256 OPENCL AES]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 524288
Raw:    76.2 c/s real, 8533 c/s virtual

strip-opencl Sun Jul 26 00:38:02 CEST 2015
Benchmarking: strip-opencl, STRIP Password Manager [PBKDF2-SHA1 OpenCL]... (4xOMP) DONE
Raw:    3531 c/s real, 81920 c/s virtual

sxc-opencl Sun Jul 26 00:38:08 CEST 2015
Benchmarking: sxc-opencl, StarOffice .sxc [PBKDF2-SHA1 OpenCL Blowfish]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 0
Raw:    21140 c/s real, 18004 c/s virtual

wpapsk-opencl Sun Jul 26 00:38:14 CEST 2015
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL 4x]... XSHA512-opencl Sun Jul 26 00:38:30 CEST 2015
zip-opencl Sun Jul 26 00:38:30 CEST 2015
Benchmarking: zip-opencl, ZIP [PBKDF2-SHA1 OpenCL AES]... (4xOMP) DONE
Raw:    7656 c/s real, 273066 c/s virtual

Sun Jul 26 00:38:36 CEST 2015
magnumripper commented 9 years ago

I fixed oldoffice. Maybe it was actually proper code but better safe than sorry.

frank-dittrich commented 9 years ago

A copy of magnum's link from #1515 https://gist.github.com/spiralray/cae0bc235509e495fec1

Just in case someone wants to test this on ubuntu...

frank-dittrich commented 9 years ago

BTW, I reported two beignet bugs, let's we what happens. https://bugs.freedesktop.org/show_bug.cgi?id=91524 https://bugs.freedesktop.org/show_bug.cgi?id=91525

frank-dittrich commented 9 years ago

One of these beignet bugs has recently been fixed; https://bugs.freedesktop.org/show_bug.cgi?id=91524

But for WPAPSK, the GPU can't compete...

(bleeding-jumbo)run $ ../run/john --test=20 --format=wpapsk-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL 4x]... Options used: -I ../run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=105 -DITERATIONS=4095 -DPLAINTEXT_LENGTH=63 -DV_WIDTH=4
Calculating best global worksize (GWS); max. 10s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 64.517ms*, init: 677.920us, loop: 78x20.535ms, pass2: 362.800us, final: 1.050ms, xfer: 62.358ms*
gws:       512        1183 c/s    19379906 rounds/s    1.730s per crypt_all()!
xfer: 66.064ms*, init: 1.298ms, loop: 78x20.818ms, pass2: 665.600us, final: 1.084ms, xfer: 64.368ms*
gws:      1024        2330 c/s    38170060 rounds/s    1.757s per crypt_all()+
xfer: 1.040us, init: 2.650ms, loop: 78x38.103ms, pass2: 1.302ms, final: 2.075ms, xfer: 118.578ms*
gws:      2048        2645 c/s    43330390 rounds/s    3.097s per crypt_all()+
xfer: 2us, init: 5.187ms, loop: 78x68.788ms, pass2: 2.546ms, final: 3.807ms, xfer: 1.280us
gws:      4096        3046 c/s    49899572 rounds/s    5.377s per crypt_all()+
xfer: 404.082ms*, init: 10.301ms, loop: 78x126.007ms, pass2: 4.866ms, final: 6.951ms, xfer: 389.816ms*
gws:      8192        3078 c/s    50423796 rounds/s   10.645s per crypt_all() - too slow
Local worksize (LWS) 512, global worksize (GWS) 4096
DONE
Raw:    2958 c/s real, 192752 c/s virtual

Failed to release test userptr object! (9) i915 kernel driver may not be sane!
(bleeding-jumbo)run $ ../run/john --test=20 --format=wpapsk --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Benchmarking: wpapsk, WPA/WPA2 PSK [PBKDF2-SHA1 256/256 AVX2 8x]... (4xOMP) DONE
Raw:    9982 c/s real, 2512 c/s virtual
magnumripper commented 9 years ago
Native vector widths:   char 8, short 8, int 4, long 2

This is not what the Intel driver says (all 1). Perhaps a beigenet bug. Or maybe they are not the same model? Here's your i5-4570 output with warnings redacted:

$ OCL_IGNORE_SELF_TEST=1 ./john --list=opencl-devices --verbosity=5
Platform #0 name: Intel Gen OCL Driver
Platform version: OpenCL 1.2 beignet 1.1 (git-834d0ae)
    Platform extensions:    cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd
    Device #0 (0) name: Intel(R) HD Graphics Haswell GT2 Desktop
    Device vendor:      Intel
    Device type:        GPU (LE)
    Device version:     OpenCL 1.2 beignet 1.1 (git-834d0ae)
    Driver version:     1.1
    Native vector widths:   char 8, short 8, int 4, long 2
    Preferred vector width: char 16, short 8, int 4, long 2
    Global Memory:      2.0 GB
    Device extensions:  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd
    Global Memory Cache:    8.0 KB
    Local Memory:       64.0 KB (Global)
    Max memory alloc. size: 1024.0 MB
    Max clock (MHz):    1000
    Profiling timer res.:   80 ns
    Max Work Group Size:    512
    Parallel compute cores: 20

Here's well's i7-4770K w/ Intel driver

Platform #1 name: Intel(R) OpenCL, version: OpenCL 1.2 
    Platform extensions:    cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
    Device #0 (3) name:     Intel(R) HD Graphics
    Device vendor:          Intel(R) Corporation
    Device type:            GPU (LE)
    Device version:         OpenCL 1.2 
    Driver version:         16.4.2.1.39163
    Native vector widths:   char 1, short 1, int 1, long 1
    Preferred vector width: char 1, short 1, int 1, long 1
    Global Memory:          1.0 GB
    Device extensions:      cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_motion_estimation cl_intel_subgroups cl_intel_va_api_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir 
    Global Memory Cache:    256.2 KB
    Local Memory:           64.0 KB (Local)
    Max memory alloc. size: 407.4 MB
    Max clock (MHz):        1250
    Profiling timer res.:   80 ns
    Max Work Group Size:    512
    Parallel compute cores: 20
frank-dittrich commented 9 years ago

With the latest beignet commit http://cgit.freedesktop.org/beignet/commit/?id=18a52ffc966027a3004b85c7c03c9416e1a84c3a the bug https://bugs.freedesktop.org/show_bug.cgi?id=91525 seems to be fixed. I don't get ASSERTION FAILED anymore.

However, this is what happens now:

$ time ./john --test --format=office2013-opencl 
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) 
total:1 hash_loops 100 rounds 100004 looptime 0
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_INVALID_EVENT error in common-opencl.c:1357 - WaitForEvents failed

real    0m33.325s
user    0m17.857s
sys 0m0.074s
$ time ./john --test --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2
Calculating best GWS for LWS=32; max. 5s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 5.385s*, xfer: 5.385s*, init: 1.274ms, loop: 1000x0ns,  final: 6.219ms, xfer: 6.328ms*

total:1 hash_loops 100 rounds 100004 looptime 0
gws:       128          23 c/s     2300092 rounds/s   10.785s per crypt_all()!
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:328 - failed in clEnqueueNDRangeKernel
 (error occurred)Calculating best LWS for GWS=128
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:320 - failed in clEnqueueNDRangeKernel
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:320 - failed in clEnqueueNDRangeKernel
OpenCL CL_INVALID_EVENT error in common-opencl.c:1357 - WaitForEvents failed

real    0m33.834s
user    0m17.863s
sys 0m0.079s

Not sure whether this is a bleeding-jumbo error, a beignet error, a kernel driver error, or a combination of errors. Any thoughts?

I'll try to upgrade to Linux kernel version 4.3.0-0.rc1 and see whether the problem disappears.

magnumripper commented 9 years ago

I see nothing that points to a bug in Jumbo. Whether it's beignet or kernel I have no idea.

frank-dittrich commented 9 years ago

What about this then?

$ ./john --list=format-all-details --format=descrypt-opencl 
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Format label                         descrypt-opencl
 Disabled in configuration file      no
Min. password length in bytes        0
Max. password length in bytes        8
Min. keys per crypt                  32
Max. keys per crypt                  32
Flags
 Case sensitive                      yes
 Truncates at (our) max. length      yes
 Supports 8-bit characters           no
 Converts 8859-1 to UTF-16/UCS-2     no
 Honours --encoding=NAME             no
 False positives possible            no
 Uses a bitslice implementation      yes
 The split() method unifies case     no
 A $dynamic$ format                  no
 A dynamic sized salt                no
 Parallelized with OpenMP            no
Number of test vectors               6
Algorithm name                       DES OpenCL
Format name                          traditional crypt(3)
Benchmark comment                    
Benchmark length                     0
Binary size                          8
Salt size                            4
Tunable cost parameters              
Example ciphertext                   CCNf8Sbh3HDfQ

OpenCL CL_INVALID_PROGRAM error in opencl_DES_bs_b_plug.c:104 - Error releasing Program
(bleeding-jumbo)run $ echo $?
1
 $ ./john --list=format-details --format=descrypt-opencl 
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
descrypt-opencl 8   32  32  00210001    6   DES OpenCL  traditional crypt(3)        0   8   4       0   CCNf8Sbh3HDfQ
OpenCL CL_INVALID_PROGRAM error in opencl_DES_bs_b_plug.c:104 - Error releasing Program
$ ./john --list=format-details --format=opencl 
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
sha1crypt-opencl    64  1   1   00000003    4   PBKDF1-SHA1 OpenCL 4x   (NetBSD)        -1001   20  64  iteration count 0   $sha1$64000$wnUR8T1U$vt1TFQ50tBMFgkflAFAOer2CwdYZ
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
oldoffice-opencl    57  1   1   0002000f    5   MD5/SHA1 RC4 OpenCL MS Office <= 2003       -1000   0   72  hash type   0   $oldoffice$1*de17a7f3c3ff03a39937ba9666d6e952*2374d5b6ce7449f57c9f252f9f9b53d2*e60e1185f7aecedba262f869c0236f81
[...]
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
md5crypt-opencl 15  1   1   00000003    61  MD5 OpenCL  crypt(3) $1$        -1  16  9       0   $1$Btiy90iG$bGn4vzF3g1rIVGZ5odGIp/
sha256crypt-opencl  24  1   1   00000003    7   SHA256 OpenCL   crypt(3) $5$    (rounds=5000)   -1  32  28  iteration count 0   $5$LKO/Ute40T3FNF95$U0prpBQd4PloSGU0pnpM4z9wKn4vZ1.jsrzQfPqxph9
sha512crypt-opencl  23  1   1   00000003    6   SHA512 OpenCL   crypt(3) $6$    (rounds=5000)   -1  64  32  iteration count 0   $6$LKO/Ute40T3FNF95$6S/6T2YuOIHY0N3XpLKABJ3soYcXD9mB7uVbtEZDj/LNscVhZoZ9DEH.sBciDrMsHOWOoASbNLTypH/5X26gN0
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
descrypt-opencl 8   32  32  00210001    6   DES OpenCL  traditional crypt(3)        0   8   4       0   CCNf8Sbh3HDfQ
Segmentation fault (core dumped)

Backtrace of that segfault:

Program received signal SIGSEGV, Segmentation fault.
0x00000000000000ff in ?? ()
(gdb) bt
#0  0x00000000000000ff in ?? ()
#1  0x00007ffff70dc9c7 in clReleaseProgram () from /lib64/libOpenCL.so.1
#2  0x00000000005a6ca3 in clean_all_buffers () at opencl_DES_bs_b_plug.c:104
#3  0x00000000006c4b87 in fmt_done (format=format@entry=0xb0d300 <fmt_opencl_DES>) at formats.c:109
#4  0x00000000006f47fc in listconf_parse_late () at listconf.c:570
#5  0x00000000006ce07d in john_init (name=0x7fffffffe314 "john", argc=3, argv=0x7fffffffdfd8) at john.c:1439
#6  0x00000000006ced1f in main (argc=3, argv=0x7fffffffdfd8) at john.c:1867
frank-dittrich commented 9 years ago

For this one

$ ./john --test --format=descrypt-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Failed to release test userptr object! (9) i915 kernel driver may not be sane!
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: descrypt-opencl, traditional crypt(3) [DES OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER 
ASSERTION FAILED: isScalarType(type)
  at file /home/fd/git/beignet/backend/src/llvm/llvm_gen_backend.cpp, function gbe::ir::Type gbe::getType(gbe::ir::Context&, const llvm::Type*), line 196
Trace/breakpoint trap (core dumped)

I filed a new beignet bug: https://bugs.freedesktop.org/show_bug.cgi?id=92081

magnumripper commented 9 years ago

I opened #1778 for the problem with new DEScrypt-opencl and --list (it's unrelated to beignet).

frank-dittrich commented 9 years ago

Beignet bug https://bugs.freedesktop.org/show_bug.cgi?id=92081 (fixed earlier) reappeared for a different opencl format.

frank-dittrich commented 8 years ago

Finally I found the time to re-test https://bugs.freedesktop.org/show_bug.cgi?id=91525 (GPU hang when testing --format=office2013-opencl on my Haswell with latest beignet.

Running that test after echo -n 0 > /sys/module/i915/parameters/enable_hangcheck to disable the hang check didn't help. I kept john running for about half an hour, the system didn't respond, so I has to use the power button to reboot. That's why I doubt it is john's fault. But let's see what beignet developers think.

Even with GWS=1 LWS=1 I get the hang.

$ GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Will run 4 OpenMP threads
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 1, global worksize (GWS) 1
drm_intel_gem_bo_context_exec() failed: Input/output error
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel

This is the corresponting dmesg output:

[  149.302616] [drm] stuck on render ring
[  149.303861] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2110], reason: Ring hung, action: reset
[  149.303865] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  149.303867] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  149.303870] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  149.303872] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  149.303874] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  149.306033] drm/i915: Resetting chip after gpu hang
[  155.301738] [drm] stuck on render ring
[  155.302955] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2110], reason: Ring hung, action: reset
[  155.305137] drm/i915: Resetting chip after gpu hang

The contents of /sys/class/drm/card0/error has been attached to the beignet bug report, see https://bugs.freedesktop.org/show_bug.cgi?id=91525#c11.

Repeated tests of that formats just result in these dmesg lines:

[  520.187119] [drm] stuck on render ring
[  520.188347] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2239], reason: Ring hung, action: reset
[  520.190142] drm/i915: Resetting chip after gpu hang
[  526.179121] [drm] stuck on render ring
[  526.180357] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2239], reason: Ring hung, action: reset
[  526.182550] drm/i915: Resetting chip after gpu hang
[ 3991.026791] [drm] stuck on render ring
[ 3991.027969] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2733], reason: Ring hung, action: reset
[ 3991.030133] drm/i915: Resetting chip after gpu hang
[ 3997.024799] [drm] stuck on render ring
[ 3997.026080] [drm] GPU HANG: ecode 7:0:0xf3cffffe, in john [2733], reason: Ring hung, action: reset
[ 3997.027766] drm/i915: Resetting chip after gpu hang

But with linux kernel 4.3, the

Failed to release test userptr object! (9) i915 kernel driver may not be sane!

lines finally disappeared.

frank-dittrich commented 8 years ago

Next week I'll most likely have time to re-test. At least for one of the problems there should be a commit fixing the problem.

frank-dittrich commented 8 years ago

Status update:

With latest John the Ripper bleeding-jumbo commit 8d4470ff9f2357fc10c8e5769dbb164eb5118f40. latest beignet commit

commit 032b606f8c5baa53e52b1f55c4f7c0bafdd6ff37
Author: Junyan He <junyan.he@linux.intel.com>
Date:   Mon Dec 14 14:51:26 2015 +0800

    Backend: Fix a memory leak for structurizer.

    In structurizer, the useless instruction is just be
    erased from block. The iintrusive_list::erase() just
    unlink the instruction, but not free its resource.
    We should use remove() to deallocate the instruction
    object.

    Signed-off-by: Junyan He <junyan.he@linux.intel.com>
    Reviewed-by: Yang Rong <rong.r.yang@intel.com>

and a newer Linux kernel (4.4.0-0.rc6.git0.1.vanilla.knurd.1.fc22.x86_64), I get these results:

https://bugs.freedesktop.org/show_bug.cgi?id=91524

./john --test=0 --format=PBKDF2-HMAC-SHA512-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: PBKDF2-HMAC-SHA512-opencl, GRUB2 / OS X 10.8+, rounds=1000 [PBKDF2-SHA512 OpenCL]... Loaded 11 hashes with 10 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=250 -DPLAINTEXT_LENGTH=110 -DMAX_SALT_SIZE=107 $JOHN/kernels/pbkdf2_hmac_sha512_kernel.cl
ASSERTION FAILED: !(ctx->getErrCode() == OUT_OF_RANGE_IF_ENDIF && ctx->getIFENDIFFix())
  at file /home/fd/git/beignet/backend/src/backend/gen_program.cpp, function virtual gbe::Kernel* gbe::GenProgram::compileKernel(const gbe::ir::Unit&, const string&, bool, int), line 206
Trace/breakpoint trap (core dumped)

https://bugs.freedesktop.org/show_bug.cgi?id=91525

Sometimes I still get

GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 1, global worksize (GWS) 1
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel

and dmesg shows

[ 2236.597358] [drm] stuck on render ring
[ 2236.598523] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2894], reason: Ring hung, action: reset
[ 2236.598527] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 2236.598529] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 2236.598531] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 2236.598533] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 2236.598536] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 2236.600685] drm/i915: Resetting chip after gpu hang
[ 2242.597091] [drm] stuck on render ring
[ 2242.597661] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2894], reason: Ring hung, action: reset
[ 2242.599757] drm/i915: Resetting chip after gpu hang

I attached the contents of /sys/class/drm/card0/error to the beignet bug.

But sometimes I don't run into this problem, instead I see

$ GWS=1 LWS=1 ./john --test=0 --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 1, global worksize (GWS) 1
FAILED (cmp_all(1))

and when running that format with --skip-self-tests against the self test hashes, the correct passwords don't crack any hash.

https://bugs.freedesktop.org/show_bug.cgi?id=92081

$ ./john --test --format=descrypt-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: descrypt-opencl, traditional crypt(3) [DES OpenCL]... Loaded 6 hashes with 4 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER  $JOHN/kernels/DES_bs_hash_checking_kernel.cl
ASSERTION FAILED: it != allocatedBlocks.end()
  at file /home/fd/git/beignet/backend/src/backend/context.cpp, function void gbe::SimpleAllocator::splitBlock(int32_t, int32_t), line 293
Trace/breakpoint trap (core dumped)
frank-dittrich commented 8 years ago

https://bugs.freedesktop.org/show_bug.cgi?id=91525

looks like john could be the culprit, causing kernel hangs.

Evidence: Try to crack a single hash:

$ ./john hash.office2013-opencl --wordlist=pw.office2013-opencl --skip-self-tests --verbosity=5 --format=office2013-opencl --pot=office2013-opencl.pot
initUnicode(UNICODE, UTF-8/ISO-8859-1)
UTF-8 -> UTF-8 -> UTF-8
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Using default input encoding: UTF-8
Loaded 1 password hash (office2013-opencl, MS Office 2013 [SHA512 OpenCL 2x AES])
Cost 1 (iteration count) is 100000 for all loaded hashes
Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Calculating best GWS for LWS=8; max. 5s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 1.680us, xfer: 1.120us, init: 1.384ms, loop: 1000x4.293s,  final: 6.285ms, xfer: 1.120us
gws:       128        0c/s           0 rounds/s 4293.777s per crypt_all()
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:328 - failed in clEnqueueNDRangeKernel
 (error occurred)Calculating best LWS for GWS=8
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:320 - failed in clEnqueueNDRangeKernel
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:320 - failed in clEnqueueNDRangeKernel
OpenCL CL_INVALID_EVENT error in common-opencl.c:1406 - WaitForEvents failed

Look at

Raw speed figures including buffer transfers:
xfer: 1.680us, xfer: 1.120us, init: 1.384ms, loop: 1000x4.293s,  final: 6.285ms, xfer: 1.120us
gws:       128        0c/s           0 rounds/s 4293.777s per crypt_all()

4293.777s per crypt_all() really looks bad.

Lets try LWS=1:

$ LWS=1 ./john hash.office2013-opencl --wordlist=pw.office2013-opencl --skip-self-tests --verbosity=5 --format=office2013-opencl --pot=office2013-opencl.pot
initUnicode(UNICODE, UTF-8/ISO-8859-1)
UTF-8 -> UTF-8 -> UTF-8
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Using default input encoding: UTF-8
Loaded 1 password hash (office2013-opencl, MS Office 2013 [SHA512 OpenCL 2x AES])
Cost 1 (iteration count) is 100000 for all loaded hashes
Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Calculating best GWS for LWS=1; max. 10s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 13.398ms*, xfer: 13.405ms*, init: 2.327ms, loop: 1000x3.343ms,  final: 6.401ms, xfer: 6.562ms*
gws:        20       11c/s     1100044 rounds/s    3.385s per crypt_all()!
xfer: 22.188ms*, xfer: 22.191ms*, init: 2.771ms, loop: 1000x6.444ms,  final: 12.701ms, xfer: 12.809ms*
gws:        40       12c/s     1200048 rounds/s    6.517s per crypt_all()+
xfer: 33.229ms*, xfer: 33.230ms*, init: 4.456ms, loop: 1000x9.377ms,  final: 18.996ms, xfer: 19.241ms*
gws:        80       16c/s     1600064 rounds/s    9.486s per crypt_all()+
xfer: 55.998ms*, xfer: 55.994ms*, init: 8.632ms, loop: 1000x15.485ms,  final: 31.531ms, xfer: 31.723ms*
gws:       160       20c/s     2000080 rounds/s   15.669s per crypt_all() - too slow
Local worksize (LWS) 1, global worksize (GWS) 80
ERROR: SessionFileProtect enabled in john.conf, and ./john.rec exists

Oops

$ rm john.rec
$ LWS=1 ./john hash.office2013-opencl --wordlist=pw.office2013-opencl --skip-self-tests --verbosity=5 --format=office2013-opencl --pot=office2013-opencl.pot
initUnicode(UNICODE, UTF-8/ISO-8859-1)
UTF-8 -> UTF-8 -> UTF-8
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Using default input encoding: UTF-8
Loaded 1 password hash (office2013-opencl, MS Office 2013 [SHA512 OpenCL 2x AES])
Cost 1 (iteration count) is 100000 for all loaded hashes
Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Calculating best GWS for LWS=1; max. 10s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 1.600us, xfer: 1.040us, init: 2.445ms, loop: 1000x4.292s,  final: 6.303ms, xfer: 1.040us
gws:        20        0c/s           0 rounds/s 4292.715s per crypt_all()
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:328 - failed in clEnqueueNDRangeKernel
 (error occurred)Local worksize (LWS) 1, global worksize (GWS) 20
Press 'q' or Ctrl-C to abort, almost any other key for status
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:320 - failed in clEnqueueNDRangeKernel

GWS=1:

$ rm john.rec
$ GWS=1 ./john hash.office2013-opencl --wordlist=pw.office2013-opencl --skip-self-tests --verbosity=5 --format=office2013-opencl --pot=office2013-opencl.pot
initUnicode(UNICODE, UTF-8/ISO-8859-1)
UTF-8 -> UTF-8 -> UTF-8
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Using default input encoding: UTF-8
Loaded 1 password hash (office2013-opencl, MS Office 2013 [SHA512 OpenCL 2x AES])
Cost 1 (iteration count) is 100000 for all loaded hashes
Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Calculating best GWS for LWS=8; max. 5s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 12.762ms*, xfer: 12.763ms*, init: 1.373ms, loop: 1000x3.594ms,  final: 6.501ms, xfer: 6.671ms*
gws:       128       70c/s     7000280 rounds/s    3.635s per crypt_all()!
xfer: 12.492ms*, xfer: 12.485ms*, init: 2.275ms, loop: 1000x3.414ms,  final: 6.501ms, xfer: 6.611ms*
gws:       256      148c/s    14800592 rounds/s    3.454s per crypt_all()!
xfer: 1.440us, xfer: 25.674ms*, init: 4.196ms, loop: 1000x6.666ms,  final: 12.887ms, xfer: 14.659ms*
gws:       512      152c/s    15200608 rounds/s    6.724s per crypt_all() - too slow
Calculating best LWS for GWS=256
Testing LWS=8 GWS=256 ... 16.893ms+
Testing LWS=16 GWS=256 ... 17.542ms
Testing LWS=32 GWS=256 ... 18.119ms
Testing LWS=64 GWS=256 ... 17.855ms
Testing LWS=128 GWS=256 ... 17.521ms
Testing LWS=256 GWS=256 ... 17.902ms
Calculating best GWS for LWS=8; max. 10s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 12.319ms*, xfer: 12.324ms*, init: 2.278ms, loop: 1000x3.502ms,  final: 6.274ms, xfer: 6.371ms*
gws:       160       90c/s     9000360 rounds/s    3.542s per crypt_all()!
xfer: 22.604ms*, xfer: 22.608ms*, init: 3.058ms, loop: 1000x6.693ms,  final: 12.610ms, xfer: 12.772ms*
gws:       320       94c/s     9400376 rounds/s    6.767s per crypt_all()+
xfer: 39.586ms*, xfer: 39.555ms*, init: 5.431ms, loop: 1000x10.037ms,  final: 19.238ms, xfer: 16.160us
gws:       640      126c/s    12600504 rounds/s   10.141s per crypt_all() - too slow
Local worksize (LWS) 8, global worksize (GWS) 320
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:02 DONE (2015-12-23 01:59) 0g/s 0.7117p/s 0.7117c/s 0.7117C/s openwall..password
Session completed
$ GWS=8 LWS=8 ./john --test --verbosity=5 --format=office2013-opencl 
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 8, global worksize (GWS) 8
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel

Still consuming too much time?

$ $ time GWS=2 LWS=2 ./john --test=0 --verbosity=5 --format=office2013-opencl 
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 2, global worksize (GWS) 2
FAILED (cmp_all(1))

real    0m20.240s
user    0m16.818s
sys 0m1.081s

Any idea how I should try to debug the failing self test?

And why is GWS=1 LWS=1 failing?

$ time GWS=1 LWS=1 ./john --test=0 --verbosity=5 --format=office2013-opencl 
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Local worksize (LWS) 1, global worksize (GWS) 1
drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel

real    0m29.673s
user    0m17.569s
sys 0m0.118s

Now that https://bugs.freedesktop.org/show_bug.cgi?id=91525 looks like there could be a john the ripper bug (hanging the GPU) and a second one (failing self test), should I create one or two new bugs, and keep issue 1571 for https://bugs.freedesktop.org/show_bug.cgi?id=91524 and https://bugs.freedesktop.org/show_bug.cgi?id=92081 instead?

frank-dittrich commented 8 years ago
$ ./john --list=build-info 
Version: 1.8.0.6-jumbo-1-1790-g8d4470f+
Build: linux-gnu 64-bit AVX2-ac
SIMD: AVX2, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
$JOHN is ./
Format interface version: 13
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 5.3.1
GNU libc version: 2.21 (loaded: 2.21)
OpenCL headers version: 1.2
Crypto library: OpenSSL
OpenSSL library version: 0100010bf
OpenSSL 1.0.1k-fips 8 Jan 2015
GMP library version: 6.0.0
Regex library version: 1.2  (loaded: 1.2.3)
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
(bleeding-jumbo)run $ git diff
diff --git a/src/bt.c b/src/bt.c
index 8a658c1..0194c46 100644
--- a/src/bt.c
+++ b/src/bt.c
@@ -623,7 +623,7 @@ unsigned int create_perfect_hash_table(int htype, void *loaded_hashes_ptr,
        if (sigaction(SIGALRM, &new_action, NULL) < 0)
                bt_error("Error setting new signal handler.");

-       if (setitimer(ITIMER_REAL, NULL, &old_it) < 0)
+       if (getitimer(ITIMER_REAL, &old_it) < 0)
                bt_error("Error retriving timer info.");

        inc_ht = 0.005;
frank-dittrich commented 8 years ago

Just to get an idea about the performance, here a --test (without =0) and without LWS and GWS:

$ ./john --test --format=office2013-opencl --verbosity=5
initUnicode(UNICODE, ASCII/ASCII)
ASCII -> ASCII -> ASCII
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... Loaded 5 hashes with 5 different salts to test db from test vectors
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=34 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER -DHASH_LOOPS=100 -DUNICODE_LENGTH=96 -DV_WIDTH=2 $JOHN/kernels/office2013_kernel.cl
Calculating best GWS for LWS=8; max. 5s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 12.875ms*, xfer: 12.878ms*, init: 1.454ms, loop: 1000x3.607ms,  final: 6.533ms, xfer: 6.692ms*
gws:       128       70c/s     7000280 rounds/s    3.648s per crypt_all()!
xfer: 12.066ms*, xfer: 12.057ms*, init: 2.157ms, loop: 1000x3.325ms,  final: 6.345ms, xfer: 6.440ms*
gws:       256      152c/s    15200608 rounds/s    3.364s per crypt_all()!
xfer: 25.251ms*, xfer: 25.253ms*, init: 4.019ms, loop: 1000x6.549ms,  final: 12.789ms, xfer: 14.532ms*
gws:       512      154c/s    15400616 rounds/s    6.631s per crypt_all() - too slow
Calculating best LWS for GWS=256
Testing LWS=8 GWS=256 ... 16.895ms+
Testing LWS=16 GWS=256 ... 17.606ms
Testing LWS=32 GWS=256 ... 17.716ms
Testing LWS=64 GWS=256 ... 17.535ms
Testing LWS=128 GWS=256 ... 17.200ms
Testing LWS=256 GWS=256 ... 16.827ms+
Calculating best GWS for LWS=256; max. 10s total for crypt_all()
Raw speed figures including buffer transfers:
xfer: 76.066ms*, xfer: 75.797ms*, init: 8.015ms, loop: 1000x20.641ms,  final: 41.258ms, xfer: 46.837ms*
gws:      5120      490c/s    49001960 rounds/s   20.890s per crypt_all()!
xfer: 144.864ms*, xfer: 144.340ms*, init: 15.394ms, loop: 1000x41.793ms,  final: 81.045ms, xfer: 85.929ms*
gws:     10240      484c/s    48401936 rounds/s   42.266s per crypt_all() - too slow
Local worksize (LWS) 256, global worksize (GWS) 5120
FAILED (cmp_all(1))
magnumripper commented 8 years ago

What in these walls of text suggests to you that the bug is in JtR?

frank-dittrich commented 8 years ago

with latest bleeding-jumbo commit ca11872eaf094b0dbe90ba3f74fae5366d2b3125, Linux kernel 4.4.0-0.rc6.git1.1.vanilla.knurd.1.fc22.x86_64 and latest beignet commit f74980864a41daa5d644a033a16132768ce89296, all 53 opencl formats pass self test.

(bleeding-jumbo)run $ ./john --test=10 --format=opencl --verbosity=3
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL 4x]... DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw:    117 c/s real, 18285 c/s virtual

Benchmarking: oldoffice-opencl, MS Office <= 2003 [MD5/SHA1 RC4 OpenCL]... DONE
Speed for cost 1 (hash type) of 1 and 0
Warning: "Many salts" test limited: 96/256
Many salts: 1566K c/s real, 120989K c/s virtual
Only one salt:  1473K c/s real, 23405K c/s virtual

Benchmarking: PBKDF2-HMAC-MD4-opencl [PBKDF2-MD4 OpenCL 4x]... DONE
Speed for cost 1 (iterations) of 1000
Raw:    108503 c/s real, 16384K c/s virtual

Benchmarking: PBKDF2-HMAC-MD5-opencl [PBKDF2-MD5 OpenCL 4x]... DONE
Speed for cost 1 (iterations) of 1000
Raw:    73251 c/s real, 32768K c/s virtual

Benchmarking: PBKDF2-HMAC-SHA1-opencl [PBKDF2-SHA1 OpenCL 4x]... DONE
Speed for cost 1 (iterations) of 1000
Raw:    30453 c/s real, 16384K c/s virtual

Benchmarking: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... DONE
Raw:    605 c/s real, 106666 c/s virtual

Benchmarking: RAR5-opencl [PBKDF2-SHA256 OpenCL]... DONE
Speed for cost 1 (iteration count) of 32768
Raw:    396 c/s real, 128000 c/s virtual

Benchmarking: truecrypt-opencl, TrueCrypt AES256_XTS [RIPEMD160 OpenCL]... DONE
Raw:    1547 c/s real, 277333 c/s virtual

Benchmarking: lotus5-opencl, Lotus Notes/Domino 5 [OpenCL]... DONE
Raw:    1032K c/s real, 1224K c/s virtual

Benchmarking: agilekeychain-opencl, 1Password Agile Keychain [PBKDF2-SHA1 OpenCL AES]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:    31177 c/s real, 840205 c/s virtual

Benchmarking: bcrypt-opencl ("$2a$05", 32 iterations) [Blowfish OpenCL]... DONE
Speed for cost 1 (iteration count) of 32
Raw:    141 c/s real, 13381 c/s virtual

Benchmarking: blockchain-opencl, blockchain My Wallet [PBKDF2-SHA1 OpenCL AES]... DONE
Raw:    473526 c/s real, 809752 c/s virtual

Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE
Raw:    117491 c/s real, 29696K c/s virtual

Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... DONE
Speed for cost 1 (iteration count) of 5000
Raw:    860 c/s real, 60235 c/s virtual

Benchmarking: sha512crypt-opencl, crypt(3) $6$ (rounds=5000) [SHA512 OpenCL]... DONE
Speed for cost 1 (iteration count) of 5000
Raw:    156 c/s real, 28444 c/s virtual

Benchmarking: descrypt-opencl, traditional crypt(3) [DES OpenCL]... DONE
Warning: "Many salts" test limited: 17/256
Many salts: 1694K c/s real, 594193K c/s virtual
Only one salt:  1640K c/s real, 50840K c/s virtual

Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 OpenCL 3DES/AES]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:    7937 c/s real, 15814 c/s virtual

Benchmarking: encfs-opencl, EncFS [PBKDF2-SHA1 OpenCL 4x AES/Blowfish]... DONE
Speed for cost 1 (iteration count) of 181474 and 181317
Raw:    64.0 c/s real, 7314 c/s virtual

Benchmarking: gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1 OpenCL]... DONE
Speed for cost 1 (s2k-count) of 65536, cost 2 (hash algorithm [1:MD5 2:SHA1 3:RIPEMD160 8:SHA256 9:SHA384 10:SHA512 11:SHA224]) of 2, cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256]) of 3
Raw:    40715 c/s real, 315076 c/s virtual

Benchmarking: iwork-opencl, Apple iWork '09 / '13 / '14 [PBKDF2-SHA1 OpenCL 4x AES]... DONE
Speed for cost 1 (iteration count) of 100000
Raw:    60.1 c/s real, 7111 c/s virtual

Benchmarking: keychain-opencl, Mac OS X Keychain [PBKDF2-SHA1 OpenCL 3DES]... DONE
Raw:    15029 c/s real, 216338 c/s virtual

Benchmarking: keyring-opencl, GNOME Keyring [SHA256 OpenCL AES]... DONE
Speed for cost 1 (iteration count) of 3221 and 2439
Raw:    3704 c/s real, 195368 c/s virtual

Benchmarking: krb5pa-md5-opencl, Kerberos 5 AS-REQ Pre-Auth etype 23 [MD4 HMAC-MD5 RC4 OpenCL]... DONE
Warning: "Many salts" test limited: 82/256
Many salts: 2684K c/s real, 92654K c/s virtual
Only one salt:  2317K c/s real, 29449K c/s virtual

Benchmarking: krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18 [PBKDF2-SHA1 OpenCL 4x]... DONE
Raw:    2243 c/s real, 93090 c/s virtual

Benchmarking: LM-opencl [DES BS OpenCL]... 

DONE
Raw:    14085K c/s real, 100663K c/s virtual

Benchmarking: mscash-opencl, M$ Cache Hash [MD4 OpenCL]... DONE
Many salts: 5417K c/s real, 216924K c/s virtual
Only one salt:  4517K c/s real, 24311K c/s virtual

Benchmarking: mscash2-opencl, MS Cache Hash 2 (DCC2) [PBKDF2-SHA1 OpenCL]... 

DONE
Raw:    3114 c/s real, 3114 c/s virtual

Benchmarking: mysql-sha1-opencl, MySQL 4.1+ [SHA1 OpenCL (inefficient, development use only)]... DONE
Raw:    4059K c/s real, 22573K c/s virtual

Benchmarking: Raw-SHA512-ng-opencl [SHA512 OpenCL]... DONE
Raw:    1683K c/s real, 28398K c/s virtual

Benchmarking: XSHA512-ng-opencl, Mac OS X 10.7 salted [SHA512 OpenCL]... DONE
Warning: "Many salts" test limited: 28/256
Many salts: 1816K c/s real, 305834K c/s virtual
Only one salt:  1731K c/s real, 32172K c/s virtual

Benchmarking: nt-opencl, NT [MD4 OpenCL]... DONE
Raw:    2670K c/s real, 31243K c/s virtual

Benchmarking: ntlmv2-opencl, NTLMv2 C/R [MD4 HMAC-MD5 OpenCL 4x]... DONE
Many salts: 15435K c/s real, 386662K c/s virtual
Only one salt:  4271K c/s real, 30229K c/s virtual

Benchmarking: o5logon-opencl, Oracle O5LOGON protocol [SHA1 OpenCL AES 32/64]... DONE
Raw:    2396K c/s real, 3662K c/s virtual

Benchmarking: ODF-opencl [SHA1 OpenCL Blowfish]... DONE
Raw:    13373 c/s real, 23774 c/s virtual

Benchmarking: ODF-AES-opencl [SHA256 OpenCL AES]... DONE
Speed for cost 1 (iteration count) of 1
Raw:    11409 c/s real, 46900 c/s virtual

Benchmarking: office2007-opencl, MS Office 2007 (50,000 iterations) [SHA1 OpenCL 4x AES]... DONE
Raw:    1159 c/s real, 107789 c/s virtual

Benchmarking: office2010-opencl, MS Office 2010 (100,000 iterations) [SHA1 OpenCL 4x AES]... DONE
Speed for cost 1 (iteration count) of 100000
Raw:    350 c/s real, 18285 c/s virtual

Benchmarking: PBKDF2-HMAC-SHA256-opencl, rounds=1000 [PBKDF2-SHA256 OpenCL]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:    13550 c/s real, 4778K c/s virtual

Benchmarking: phpass-opencl ($P$9 lengths 0 to 15) [MD5 OpenCL]... DONE
Raw:    36753 c/s real, 5266K c/s virtual

Benchmarking: pwsafe-opencl, Password Safe [SHA256 OpenCL]... DONE
Speed for cost 1 (iteration count) of 2048
Raw:    12928 c/s real, 275063 c/s virtual

Benchmarking: RAKP-opencl, IPMI 2.0 RAKP (RMCP+) [HMAC-SHA1 OpenCL 4x]... DONE
Warning: "Many salts" test limited: 178/256
Many salts: 2904K c/s real, 112167K c/s virtual
Only one salt:  2657K c/s real, 24958K c/s virtual

Benchmarking: Raw-MD4-opencl [MD4 OpenCL]... 
DONE
Raw:    4364K c/s real, 35805K c/s virtual

Benchmarking: Raw-MD5-opencl [MD5 OpenCL]... 
DONE
Raw:    4473K c/s real, 51826K c/s virtual

Benchmarking: Raw-SHA1-opencl [SHA1 OpenCL]... 
DONE
Raw:    4088K c/s real, 35246K c/s virtual

Benchmarking: Raw-SHA256-opencl [SHA256 OpenCL]... DONE
Raw:    4779K c/s real, 28476K c/s virtual

Benchmarking: Raw-SHA512-opencl [SHA512 OpenCL (inefficient, development use mostly)]... DONE
Raw:    1427K c/s real, 14869K c/s virtual

Benchmarking: salted-sha1-opencl [SHA1 OpenCL]... 
DONE
Warning: "Many salts" test limited: 58/256
Many salts: 6051K c/s real, 1216M c/s virtual
Only one salt:  5227K c/s real, 39125K c/s virtual

Benchmarking: 7z-opencl, 7-Zip (512K iterations) [SHA256 OPENCL AES]... DONE
Speed for cost 1 (iteration count) of 524288
Warning: "Many salts" test limited: 1/256
Many salts: 125 c/s real, 42666 c/s virtual
Only one salt:  125 c/s real, 42666 c/s virtual

Benchmarking: strip-opencl, STRIP Password Manager [PBKDF2-SHA1 OpenCL]... DONE
Raw:    3559 c/s real, 325818 c/s virtual

Benchmarking: sxc-opencl, StarOffice .sxc [PBKDF2-SHA1 OpenCL Blowfish]... DONE
Speed for cost 1 (iteration count) of 0
Raw:    13038 c/s real, 23942 c/s virtual

Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL 4x]... DONE
Raw:    2268 c/s real, 180705 c/s virtual

Benchmarking: XSHA512-opencl, Mac OS X 10.7+ [SHA512 OpenCL (efficient at "many salts" only)]... DONE
Many salts: 1497K c/s real, 34863K c/s virtual
Only one salt:  1477K c/s real, 25941K c/s virtual

Benchmarking: zip-opencl, ZIP [PBKDF2-SHA1 OpenCL AES]... DONE
Raw:    10315 c/s real, 5632K c/s virtual

All 53 formats passed self-tests!
magnumripper commented 8 years ago

Cool! Thanks for your efforts with this.

frank-dittrich commented 8 years ago

Unfortunately, meanwhile I managed to reproduce the GPU hang again.

I wanted to close those beignet bugs and, after restarting my system, I ran each individual command which caused one of those bugs in the past.

Some bugs seem to have been fixed, but some still remain. May be I was just lucky with those tests where "All 53 formats passed self-tests!"

Before doing a full --test=10 run for --format=opencl, I tested each individual opencl format with --test=0, redirecting sdtout and stderr into files, and copying dmesg output into files. No problems there (at least with a --disable-openmp build). May be testing all those formats in a certain sequence somehow initialized the GPU state or memory in a certain way, and running certain format tests after shutting down and booting the system just triggers those bugs. It also looks like after a GPU hang, the GPU isn't reset correctly, because afterwards I get

(bleeding-jumbo)run $ ./john --test=0 --format=office2013-opencl --verbosity=3
Will run 4 OpenMP threads
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) FAILED (cmp_all(1))

instead of the GPU hang

(bleeding-jumbo)run $ ./john --test=0 --format=office2013-opencl --verbosity=3
Will run 4 OpenMP threads
Device 0: Intel(R) HD Graphics Haswell GT2 Desktop
Testing: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 OpenCL 2x AES]... (4xOMP) drm_intel_gem_bo_context_exec() failed: Input/output error
OpenCL CL_OUT_OF_RESOURCES error in opencl_office2013_fmt_plug.c:323 - failed in clEnqueueNDRangeKernel

Instead of reporting this as a beignet bug, I now picked DRI as the product and DRM/Intel as the component, as suggested by the dmesg output

[ 1791.185004] [drm] stuck on render ring
[ 1791.186261] [drm] GPU HANG: ecode 7:0:0x85ddfffc, in john [2277], reason: Ring hung, action: reset
[ 1791.186265] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1791.186268] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1791.186270] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1791.186273] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1791.186275] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1791.188446] drm/i915: Resetting chip after gpu hang

https://bugs.freedesktop.org/show_bug.cgi?id=93579

Let's just keep this issue closed.

I'll open new issues depending on the response to my freedesktop.org bug reports.

BTW some of the opencl format self tests seem to write '\n' to stderr (look at the additional new lines in the --test --verbosity=3 output.

magnumripper commented 8 years ago

BTW some of the opencl format self tests seem to write '\n' to stderr (look at the additional new lines in the --test --verbosity=3 output.

You had to filter out the "spinning wheel" from the output, right? I suspect it's just something with that.

frank-dittrich commented 8 years ago

Now I know why the complete --test --format=opencl run didn't fail. I forgot to check my local config. for disabled formats

office2013-opencl = Y
PBKDF2-HMAC-SHA512-opencl = Y

But I think it is still better to keep this issue closed. I'll open new issues for the remaining problems.