openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.03k stars 2.07k forks source link

AMD rocm opencl results (failing self tests) #3572

Closed kochd closed 5 months ago

kochd commented 5 years ago

I just wanted to report the current state of self tests with AMD's rocm (https://rocm.github.io/) as nobody reported them by now.

System configuration

Attach details about your OS and about JtR, including:

There are test which result in PASS but have warnings:

Testing: mscash-opencl, M$ Cache Hash [MD4 OpenCL]... /tmp/AMD_12531_545/t_12531_547.cl:449:18: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
Testing: mscash2-opencl, MS Cache Hash 2 (DCC2) [PBKDF2-SHA1 OpenCL]... /tmp/AMD_12531_562/t_12531_564.cl:611:17: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
Testing: mysql-sha1-opencl, MySQL 4.1+ [SHA1 OpenCL]... /tmp/AMD_12531_579/t_12531_581.cl:139:18: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
Testing: NT-opencl [MD4 OpenCL]... /tmp/AMD_12531_613/t_12531_615.cl:332:18: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
Testing: raw-MD4-opencl [MD4 OpenCL]... /tmp/AMD_12531_868/t_12531_870.cl:247:18: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
Testing: raw-MD5-opencl [MD5 OpenCL]... /tmp/AMD_12531_885/t_12531_887.cl:271:18: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
Testing: raw-SHA1-opencl [SHA1 OpenCL]... /tmp/AMD_12531_902/t_12531_904.cl:139:18: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
Testing: salted-SHA1-opencl [SHA1 OpenCL]... /tmp/AMD_12531_970/t_12531_972.cl:151:41: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
Testing: SL3-opencl, Nokia operator unlock [SHA1 OpenCL]... /tmp/AMD_12531_1021/t_12531_1023.cl:151:41: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]

Notice that NT-opencl is affected by these warning and actually does not work in production and fails with:

/tmp/AMD_23440_18/t_23440_20.cl:332:18: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
                __attribute__((max_constant_size (NUM_INT_KEYS * 4)))
                               ^
1 warning generated.
Self test failed (cmp_one(0))

I cant test the others as they run fine in test mode even if test > 0

magnumripper commented 5 years ago

Thanks, we should try to address this before a release. Perhaps it's time I set up a Linux boot option on my Macbook again so I can try out things like this.

kochd commented 5 years ago

Current rocm @ 649bb4a337e69871dd9c852c2965d3722afa34df same error

magnumripper commented 5 years ago

How do we know it's a rocm driver? Some minimum version number?

kochd commented 5 years ago

just tested with rocm-dev 2.2.31 @ jumbo 97cce637cdf3796fc2a1b28117dfad4272a9e99f

kochd commented 5 years ago

clinfo:

clinfo 
Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.1 AMD-APP (2833.0)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               1
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    Ellesmere [Radeon RX 470/480]
  Device Topology:               PCI[ B#11, D#0, F#0 ]
  Max compute units:                 36
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1580Mhz
  Address bits:                  64
  Max memory allocation:             7301444403
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            26591
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                8589934592
  Constant buffer size:              7301444403
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 65536
  Max pipe arguments:                16
  Max pipe active reservations:          16
  Max pipe packet size:              3006477107
  Max global variable size:          7301444403
  Max global variable preferred total size:  8589934592
  Max read/write image args:             64
  Max on device events:              1024
  Queue on device max size:          8388608
  Max on device queues:              1
  Queue on device preferred size:        262144
  SVM capabilities:              
    Coarse grain buffer:             Yes
    Fine grain buffer:               Yes
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                Yes
    Profiling :                  Yes
  Platform ID:                   0x7fa410a4af70
  Name:                      gfx803
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 2.0 
  Driver version:                2833.0 (HSA1.1,LC)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 
  Extensions:                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 
magnumripper commented 5 years ago

Driver version: 2833.0 (HSA1.1,LC)

I suspect the HSA is a tell-tale. I'll go with that until someone says anything else. It's strange, however, that it's on device level only, it's not seen at platform level.

magnumripper commented 5 years ago

Hmm but super has 2766.4 (PAL,HSAIL) - AMDGPU-Pro which is indeed AMD PRO and not ROCM (or can one be both? 😵 )

magnumripper commented 5 years ago

Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

So maybe some hint here. We have cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_amd_copy_buffer_p2p cl_amd_assembly_program

The AMDPRO driver on super reports this: Device extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes

That includes cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt

A wild guess could be that cl_amd_assembly_program indicates rocm, but who knows.

magnumripper commented 5 years ago

Not sure I'll ever understand all AMD's acronyms. Looks like AMDPRO might include ROCr which is a ROCm runtime and maybe also a ROCK kernel driver (now with a captial K because that rocks) which may or may not be included in AMDPRO, who knows. I thought there were two different sets of AMD drivers: AMDPRO and ROCM. Apparently they are not that separate. Are they even the same thing, but one is for end users and the other for developers? I have absolutely no idea. I will leave this issue unattended - anyone who likes this mess is welcome to suggest fixes.

Oh, BTW we shouldn't need any fixes or workarounds, this is "OpenCL", isn't it? Yeah right 🙄

kochd commented 5 years ago

Yes their acronyms are absolutely messed up.

solardiz commented 5 years ago

Looks unrealistic we'll make relevant changes and properly re-test them on all platforms before the release. Changed milestone.

kochd commented 5 years ago

rocm-dev 2.7.22 @ jumbo 88829829f23490c314c412f065d75d96cc394901 problem persists

john --test=0 --format=keepass-opencl
Device 1: gfx803 [Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]]
Testing: KeePass-opencl [SHA256 AES/Twofish/ChaCha OpenCL]... Options used: -I /root/git/JohnTheRipper/run/opencl -cl-mad-enable -D__GPU__ -DDEVICE_INFO=522 -D__SIZEOF_HOST_SIZE_T__=8 -DDEV_VER_MAJOR=2949 -DDEV_VER_MINOR=0 -D_OPENCL_COMPILER -DPLAINTEXT_LENGTH=124 -DHASH_LOOPS=100 -DMAX_CONT_SIZE=16777216 /root/JohnTheRipper/run/opencl/keepass_kernel.cl
Build log: : error: undefined hidden symbol: SHA256_Update
>>> referenced by /tmp/t_23061_33-6d6703.o:(keepass_init)
>>> referenced by /tmp/t_23061_33-6d6703.o:(keepass_init)
>>> referenced by /tmp/t_23061_33-6d6703.o:(keepass_final)
>>> referenced by /tmp/t_23061_33-6d6703.o:(keepass_final)
Error: Creating the executable from LLVM IRs failed.

Error building kernel /root/JohnTheRipper/run/opencl/keepass_kernel.cl. DEVICE_INFO=522
0: OpenCL CL_BUILD_PROGRAM_FAILURE (-11) error in opencl_common.c:1376 - clBuildProgram
# rar passes now
Testing: pgpdisk-opencl [SHA1 AES/TwoFish/CAST OpenCL]... Options used: -I /root/git/JohnTheRipper/run/opencl -cl-mad-enable -D__GPU__ -DDEVICE_INFO=522 -D__SIZEOF_HOST_SIZE_T__=8 -DDEV_VER_MAJOR=2949 -DDEV_VER_MINOR=0 -D_OPENCL_COMPILER -DPLAINTEXT_LENGTH=124 -DBINARY_SIZE=16 /root/JohnTheRipper/run/opencl/pgpdisk_kernel.cl
Build log: : error: undefined hidden symbol: pgpdisk_kdf
>>> referenced by /tmp/t_24544_33-59313a.o:(pgpdisk_aes)
>>> referenced by /tmp/t_24544_33-59313a.o:(pgpdisk_aes)
>>> referenced by /tmp/t_24544_33-59313a.o:(pgpdisk_twofish)
>>> referenced by /tmp/t_24544_33-59313a.o:(pgpdisk_twofish)
>>> referenced by /tmp/t_24544_33-59313a.o:(pgpdisk_cast)
>>> referenced by /tmp/t_24544_33-59313a.o:(pgpdisk_cast)
Error: Creating the executable from LLVM IRs failed.

Error building kernel /root/JohnTheRipper/run/opencl/pgpdisk_kernel.cl. DEVICE_INFO=522
0: OpenCL CL_BUILD_PROGRAM_FAILURE (-11) error in opencl_common.c:1376 - clBuildProgram
7z passes now

i cant usejohn --test=0 --format=opencl anymore to test them all. The first fail will stop the iteration.

NT-opencl still fails in production despite passing the test.

kochd commented 5 years ago

@solardiz do these tests include rocm as platform ?

solardiz commented 4 years ago

@kochd What do you refer to by "these tests"? Personally, I don't test on rocm.

kochd commented 4 years ago

Looks unrealistic we'll make relevant changes and properly re-test them on all platforms before the release.

solardiz commented 4 years ago

That comment applied to what we did before the 1.9.0-jumbo-1 release back in May, and it specifically excluded this issue from the pre-release testing. As in: we knew some formats fail tests on rocm, and were not going to fix that.

solardiz commented 4 years ago

@kochd Maybe you can contribute not only problem reports, but also fixes/workarounds? :-) You're on rocm, and you can try to troubleshoot and workaround the issues.

kochd commented 4 years ago

I would but i have to admit that i am lacking the experience to help in that field. Also: Don't get me wrong - i am not trying generate any pressure here. My whole intent in posting updates is to give feedback instead of letting this issue become stale.

solardiz commented 4 years ago

@kochd We appreciate your feedback. It helps. Thanks!

magnumripper commented 4 years ago

Yes it does help, the error: undefined hidden symbol is interesting and something we can start from. I would think it's a bug in the rocm runtime and not our code but I'll look at those kernels just to confirm we don't have some macro check messed up.

Oh and thanks @kochd for that explanation (back then) of what rocm is.

BTW do you have an AMD driver's "CPU device" to try as well? Maybe confirm whether that one is working or not. I expect it to work.

kochd commented 4 years ago

When looking a the commits @magnumripper made since April to the rar und 7z kernels i dont see anything obvious that would have let to making rar and 7z pass now - which failed back in April. So rocm might have fixed something related to that issue. ROCm could have some compatibility issues with the failing kernels. However: I've used other software that makes use of OpenCL without encountering such problems.

Sadly I only have that GPU available:

 ./john --list=OpenCL-devices
Platform #0 name: AMD Accelerated Parallel Processing, version: OpenCL 2.1 AMD-APP (2949.0)
    Device #0 (1) name:     gfx803
    Board name:             Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
    Device vendor:          Advanced Micro Devices, Inc.
    Device type:            GPU (LE)
    Device version:         OpenCL 1.2 
    Driver version:         2949.0 (HSA1.1,LC) - AMDGPU-Pro  
    Native vector widths:   char 4, short 2, int 1, long 1
    Preferred vector width: char 4, short 2, int 1, long 1
    Global Memory:          8 GB
    Global Memory Cache:    16 KB
    Local Memory:           64 KB (Local)
    Constant Buffer size:   6963 MB
    Max memory alloc. size: 6963 MB
    Max clock (MHz):        1580
    Profiling timer res.:   1 ns
    Max Work Group Size:    256
    Parallel compute cores: 36
    Stream processors:      2304  (36 x 64)
    Speed index:            3640320
    SIMD width:             16
    Wavefront width:        64
    PCI device topology:    0b:00.0

Why do you think a CPU would work? AFAIK any Ryzen CPU would make use of ROCm aswell (https://rocm.github.io/hardware.html).

magnumripper commented 4 years ago

When looking a the commits @magnumripper made since April to the rar und 7z kernels i dont see anything obvious that would have let to making rar and 7z pass now - which failed back in April. So rocm might have fixed something related to that issue. ROCm could have some compatibility issues with the failing kernels.

I came to the same conclusion.

However: I've used other software that makes use of OpenCL without encountering such problems.

Other software may well be better for rocm, I think hashcat actively tests rocm and they work around problems seen (that is unfortunately a very time consuming task, often involving changing random stuff and see what happens).

Why do you think a CPU would work? AFAIK any Ryzen CPU would make use of ROCm aswell (https://rocm.github.io/hardware.html).

The traditional AMD drivers include a CPU device and that one almost never has any problems. We routinely test(ed) on it on Travis/Circle CI when available. BTW when seeing AMD bugs, we're commonly only see them with specific GPU series or runtime versions while others may be perfectly fine. Even worse BTW when we work around a problem on one device or runtime version, we often introduce a whole different problem on some other device or runtime version. It's not very rewarding unfortunately :cry:

simon816 commented 4 years ago

Build log: : error: undefined hidden symbol: SHA256_Update

I have found that this is caused by the use of inline in the function declarations. ROCm uses clang to compile OpenCL code. According to clang's documentation: https://clang.llvm.org/compatibility.html#inline

By default, Clang builds C code in GNU C11 mode, so it uses standard C99 semantics for the inline keyword.

The above documentation explains the semantics of inline in C99. Basically, I was able to fix the issue by removing the inline keyword, or by replacing it with extern inline.

magnumripper commented 4 years ago

Thanks! That's good input.

GIJack commented 3 years ago

seems to be an issue again with ROCM 4.3.0

magnumripper commented 3 years ago

seems to be an issue again with ROCM 4.3.0

Several disparate problems were discussed above: What exact issue are you seeing?

magnumripper commented 3 years ago

I have found that this is caused by the use of inline in the function declarations.

Thanks! That's good input.

For the record, current code has this in opencl_misc.h:

#if __MESA__
#define inline  // empty!
#elif __POCL__
// Do nothing (POCL complains if we redefine)
#elif gpu_amd(DEVICE_INFO) // We really target ROCM here
#define inline  static inline
#else
// Do nothing
#endif
GIJack commented 3 years ago

Build log: : error: undefined hidden symbol: SHA256_Update

Just to clarify: Exact issue I am having and solution works.

Also clarifying it looks like you already got it. Thank you all!

edit: It might be worthwhile releasing a version 1.8.1 with bugfixes. Just a suggestion.

magnumripper commented 3 years ago

edit: It might be worthwhile releasing a version 1.8.1 with bugfixes. Just a suggestion.

Yes (although it'd be 1.9.1 or possibly bumpin' major), and we are... we're just really slow about it 😆

Oh, and thanks for following up

g00g1 commented 2 years ago

Hello! I had the same issue (I mean error: undefined hidden symbol), but after build from master branch it is solved! Anyway, there is another trouble - some tests are failing or even crashing john process.

$ uname -a

Linux localhost 5.10.70-1-lts #1 SMP Thu, 30 Sep 2021 09:43:10 +0000 x86_64 GNU/Linux

$ john --list=build-info

Version: 1.9.0-jumbo-1+bleeding-186c9ae1e 2021-10-05 14:51:31 +0200
Build: linux-gnu 64-bit x86_64 AVX AC MPI + OMP
SIMD: AVX, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
System-wide exec: /usr/bin
System-wide home: /usr/share/john
Private home: ~/.john
CPU tests: AVX
CPU fallback binary: john-non-avx
$JOHN is /usr/share/john/
Format interface version: 14
Max. number of reported tunable costs: 4
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
SINGLE_IDX_MAX: 2147483648
SINGLE_BUF_MAX: 4294967295
Effective limit: Number of salts vs. SingleMaxBufferSize
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 11.1.0
GNU libc version: 2.33 (loaded: 2.33)
OpenCL headers version: 2.0
Crypto library: OpenSSL
OpenSSL library version: 0101010cf
OpenSSL 1.1.1l  24 Aug 2021
GMP library version: 6.2.1
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
times(2) sysconf(_SC_CLK_TCK) is 100
Using times(2) for timers, resolution 10 ms
HR timer: clock_gettime(), latency 20 ns
Total physical host memory: 15370 MiB
Available physical host memory: 9638 MiB
Terminal locale string: en_GB.UTF-8
Parsed terminal locale: UTF-8

$ john --list=opencl-devices

Platform #0 name: AMD Accelerated Parallel Processing, version: OpenCL 2.0 AMD-APP.dbg (3305.0)
    Device #0 (1) name:     gfx902:xnack-
    Board name:             Renoir
    Device vendor:          Advanced Micro Devices, Inc.
    Device type:            GPU (LE)
    Device version:         OpenCL 2.0 
    Driver version:         3305.0 (HSA1.1,LC) - AMDGPU-Pro  
    Native vector widths:   char 4, short 2, int 1, long 1
    Preferred vector width: char 4, short 2, int 1, long 1
    Global Memory:          512 MiB
    Global Memory Cache:    16 KiB
    Local Memory:           64 KiB (Local)
    Constant Buffer size:   435 MiB
    Max memory alloc. size: 435 MiB
    Max clock (MHz):        1600
    Profiling timer res.:   1 ns
    Max Work Group Size:    256
    Parallel compute cores: 27
    Stream processors:      1728  (27 x 64)
    Speed index:            2764800
    SIMD width:             16
    Wavefront width:        64
    PCI device topology:    03:00.0

$ john --test=0 --format=opencl

Device 1@localhost: gfx902:xnack- [Renoir]
Warning: cryptosafe-opencl format should always be UTF-8. Use --target-encoding=utf8
Testing: cryptosafe-opencl [AES-256-CBC OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: KeePass-opencl [SHA256 AES/Twofish/ChaCha OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

FAILED (get_key(0))
Testing: oldoffice-opencl, MS Office <= 2003 [MD5/SHA1 RC4 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: PBKDF2-HMAC-MD4-opencl [PBKDF2-MD4 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: PBKDF2-HMAC-MD5-opencl [PBKDF2-MD5 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: PBKDF2-HMAC-SHA1-opencl [PBKDF2-SHA1 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... (8xOMP) Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: RAR5-opencl [PBKDF2-SHA256 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: TrueCrypt-opencl [RIPEMD160 AES256_XTS OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: lotus5-opencl, Lotus Notes/Domino 5 [OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: AndroidBackup-opencl [PBKDF2-SHA1 AES OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: agilekeychain-opencl, 1Password Agile Keychain [PBKDF2-SHA1 AES OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: ansible-opencl, Ansible Vault [PBKDF2-SHA256 HMAC-SHA256 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: axcrypt-opencl [SHA1 AES OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Testing: axcrypt2-opencl, AxCrypt 2.x [PBKDF2-SHA512 AES OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

PASS
Build log: /tmp/comgr-c5a8a0/input/CompileSource:117:17: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
        __attribute__((max_constant_size(16)))
                       ^~~~~~~~~~~~~~~~~~~~~
/tmp/comgr-c5a8a0/input/CompileSource:121:17: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
        __attribute__((max_constant_size(72)))
                       ^~~~~~~~~~~~~~~~~~~~~
/tmp/comgr-c5a8a0/input/CompileSource:129:17: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes]
        __attribute__((max_constant_size(4096)))
                       ^~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

LWS=8 GWS=1024
Testing: bcrypt-opencl ("$2a$05", 32 iterations) [Blowfish OpenCL]... FAILED (cmp_exact(1))
[localhost:365838] *** Process received signal ***
[localhost:365838] Signal: Bus error (7)
[localhost:365838] Signal code:  (128)
[localhost:365838] Failing at address: (nil)
[localhost:365838] [ 0] /usr/lib64/libpthread.so.0(+0x13870)[0x7f0b72cf3870]
[localhost:365838] [ 1] /usr/lib64/libc.so.6(+0x8827f)[0x7f0b72b9a27f]
[localhost:365838] [ 2] /usr/lib64/libc.so.6(cfree+0x68)[0x7f0b72b9d9e8]
[localhost:365838] [ 3] john(+0x3ecdc3)[0x55bf9d8c9dc3]
[localhost:365838] [ 4] john(+0x3ecc6e)[0x55bf9d8c9c6e]
[localhost:365838] [ 5] john(+0x345690)[0x55bf9d822690]
[localhost:365838] [ 6] john(+0x336c45)[0x55bf9d813c45]
[localhost:365838] [ 7] john(+0x350fba)[0x55bf9d82dfba]
[localhost:365838] [ 8] /usr/lib64/libc.so.6(__libc_start_main+0xd5)[0x7f0b72b39b25]
[localhost:365838] [ 9] john(+0xa008e)[0x55bf9d57d08e]
[localhost:365838] *** End of error message ***
[1]    365838 bus error (core dumped)  john --test=0 --format=opencl
magnumripper commented 2 years ago

Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument] 1 warning generated.

This warning simply can't be correct because if the compiler didn't obey it, it wouldn't be able to build the kernels.

Build log: /tmp/comgr-c5a8a0/input/CompileSource:117:17: warning: unknown attribute 'max_constant_size' ignored [-Wunknown-attributes] attribute((max_constant_size(16))) ^~~~~

Perhaps we should just drop all such __attribute__((max_constant_size(n))) lines now. I doubt they were ever needed on any gear, they were just a compiler hint, right?

Other than that, unfortunately we've become used with AMD drivers failing with many formats and working around their bugs is a Sisyphus work. Also bcrypt-opencl currently lacks a maintainer.

Avamander commented 7 months ago

I had to delete inline from md5_digest (and also place the headers at C:/var/kernel) and then it built fine. I suspect there should be a better way than that.

solardiz commented 7 months ago

@Avamander Can you please describe the problem you were having, with what JtR format(s), what GPU and driver, and what version/revision of JtR you have. Also, explain why you're adding to this specific issue - what makes you think your issue is the same as anything that was said in here before. Thanks!

toppa102 commented 5 months ago

Running john --test=0 --format=opencl results in these errors on rocm opencl:

Testing: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]... Options used: -I /usr/share/john/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=522 -D__SIZEOF_HOST_SIZE_T__=8 -DDEV_VER_MAJOR=3602 -DDEV_VER_MINOR=0 -D_OPENCL_COMPILER -DHASH_LOOPS=1024 -DOUTLEN=20 -DPLAINTEXT_LENGTH=64 -DV_WIDTH=1 /usr/share/john/kernels/pbkdf1_hmac_sha1_kernel.cl
Build log: warning: argument unused during compilation: '-I /usr/share/john/kernels' [-Wunused-command-line-argument]
1 warning generated.
lld: error: undefined hidden symbol: preproc
>>> referenced by /tmp/comgr-d70274/input/linked.bc.o:(pbkdf1_init)
>>> referenced by /tmp/comgr-d70274/input/linked.bc.o:(pbkdf1_init)

lld: error: undefined hidden symbol: hmac_sha1
>>> referenced by /tmp/comgr-d70274/input/linked.bc.o:(pbkdf1_init)
>>> referenced by /tmp/comgr-d70274/input/linked.bc.o:(pbkdf1_init)
>>> referenced by /tmp/comgr-d70274/input/linked.bc.o:(pbkdf1_final)
>>> referenced 1 more times
Error: Creating the executable from LLVM IRs failed.

Little more about my setup: GPU: AMD RX5700 XT OS: Artix Linux Kernel version: 6.8.1-artix1-1 AMD Driver: amdgpu rocm-opencl-runtime: 6.0.2-1

These errors can be fixed by either removing the inline keword from the affected functions or adding static in front of the inline.

It might be related to this rocm issue.

solardiz commented 5 months ago

@toppa102 It looks like you're using a version of JtR as packaged in a distro, so it's way out of date. The kernels directory name suggests it's code from before mid-2019 (before 66bd330041d5f81db23b42954aaea1cedecacc3e). Can you please retest using our current code from this repo?

That said, we do still have plenty of inline without static in our OpenCL code, so if your analysis is correct then the issue remains. sha1crypt-opencl won't be the first to be tested, so the first failure will likely look different.

solardiz commented 5 months ago

That said, we do still have plenty of inline without static in our OpenCL code, so if your analysis is correct then the issue remains.

Re-reading past comments, no, this doesn't appear to be the case. As @magnumripper wrote in here in 2021:

For the record, current code has this in opencl_misc.h:

#if __MESA__
#define inline  // empty!
#elif __POCL__
// Do nothing (POCL complains if we redefine)
#elif gpu_amd(DEVICE_INFO) // We really target ROCM here
#define inline  static inline
#else
// Do nothing
#endif

The #define inline static inline line was introduced in November 2019 via 32a642ce66a87d766672f10c6ae3082f26f7390b.

So I guess those of you reporting the same problem now are using code from prior to that commit, perhaps our 1.9.0-jumbo-1 release. We really ought to make a new release soon, but meanwhile please use our latest code from this repo. Thanks.

toppa102 commented 5 months ago

@toppa102 It looks like you're using a version of JtR as packaged in a distro, so it's way out of date. The kernels directory name suggests it's code from before mid-2019 (before 66bd330). Can you please retest using our current code from this repo?

That said, we do still have plenty of inline without static in our OpenCL code, so if your analysis is correct then the issue remains. sha1crypt-opencl won't be the first to be tested, so the first failure will likely look different.

Sorry I didn't realize the arch package was so out of date, I just cloned bleeding-jumbo and the tests passed.

solardiz commented 5 months ago

@toppa102 Thanks. Did literally all of the tests pass? That's very unusual for an AMD platform. (They do all pass on NVIDIA, but almost never on AMD.)

toppa102 commented 5 months ago

@solardiz Only crypt_all seems to fail:

PASS
Testing: krb5pa-md5-opencl, Kerberos 5 AS-REQ Pre-Auth etype 23 [MD4 HMAC-MD5 RC4 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

FAILED (crypt_all(1) zero return)
Testing: krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18 [PBKDF2-SHA1 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

The it gets stuck here:

PASS
Testing: krb5tgs-opencl, Kerberos 5 TGS-REP etype 23 [MD4 HMAC-MD5 RC4 OpenCL]... Build log: warning: argument unused during compilation: '-I opencl' [-Wunused-command-line-argument]
1 warning generated.

/

But I don't know if this is the last test.

solardiz commented 5 months ago

@toppa102 Thanks. The mention of crypt_all is merely where exactly this particular format (krb5pa-md5-opencl) fails. The zero return failure mode is weird - looking at the code, I don't see how it can be, unless there's host stack memory corruption, so I assume it's that. No, krb5tgs-opencl where you say it gets stuck is not the last test - it's approximately half way through. So there are in fact at least two failures here (one explicit, and one lack of progress). Unfortunately, it's quite "normal" for a handful of our OpenCL formats to fail on AMD platforms, and which ones succeed vs. fail varies.

I'm going to close this issue now since it talks about several distinct issues people saw over the years, and at the same time about nothing in particular. The known issue with undefined hidden symbol that was mentioned in here a few times is already avoided in our current code.