openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.39k stars 2.11k forks source link

crypt format occasionally fails a `--test-full=0 --format=cpu` run on Ubuntu 22 powerpc64le #5491

Open claudioandre-br opened 6 months ago

claudioandre-br commented 6 months ago

This is Ubuntu 22, Canonical's hardware. Nothing related changed in john itself recently.

This is probably a hard to reproduce issue.

Version: 1.9.0-jumbo-1+bleeding-d384b5be9a 2024-05-30 20:33:48 +0200
Build: linux-gnu 64-bit powerpc64le Altivec AC OMP
SIMD: AltiVec, interleaving: MD4:1 MD5:1 SHA1:1 SHA256:1 SHA512:1
[...]
gcc version: 11.4.0
GNU libc version: 2.35 (loaded: 2.35)
Crypto library: OpenSSL
OpenSSL library version: 030000020
OpenSSL 3.0.2 15 Mar 2022
Will run 4 OpenMP threads
Testing: descrypt, traditional crypt(3) [DES 128/128 AltiVec]... (4xOMP) PASS
Testing: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AltiVec]... (4xOMP) PASS
[...]
Testing: dummy [N/A]... PASS
Testing: crypt, generic crypt(3) [?/64]... (4xOMP) FAILED (cmp_all(96))
1 out of 410 tests have FAILED
 FAILED: -test-full=0 --format=cpu

In all cases, documenting. Well, "a close look" at the format can always make it better.


As I recall, crypt is a format that I have seen fail in some context(s).

Full log at https://launchpadlibrarian.net/733306772/buildlog_snap_ubuntu_jammy_ppc64el_john-the-ripper_BUILDING.txt.gz

solardiz commented 6 months ago

Oh wow, could be a thread-safety bug in libxcrypt. What do you mean by "Canonical hardware"?

claudioandre-br commented 6 months ago

The owner is Canonical (Ubuntu company). It might be the first time I've seen this, but it might not be either.

solardiz commented 6 months ago

Does this occur during snap package build? Does the build fail when this happens? I suppose we have no easy way to try and trigger the bug on its own (e.g., running just this one test)? Maybe we should try on Compile Farm's hardware.

claudioandre-br commented 6 months ago

Does this occur during snap package build?

Yes

Does the build fail when this happens?

Only if I want it to fail.

I suppose we have no easy way to try and trigger the bug on its own (e.g., running just this one test)?

It's probably random, so it will be extremely difficult to get more information about it. I join a queue to create the build, some stuff is possible, but it is not nice for debugging tasks.

solardiz commented 6 months ago

Reviewed the code in libxcrypt and our c3_fmt.c, found no relevant issues, including none recently fixed in libxcrypt (as Ubuntu may not have the latest version). However, found and will fix various other minor issues in our code, which should make no difference with respect to this issue.

solardiz commented 6 months ago

GNU libc version: 2.35 (loaded: 2.35)

It wouldn't be needed this time, but in general I wonder if we want and can easily add libxcrypt version in here, but somehow only when we know we're linking against libxcrypt (tricky, since we don't do that explicitly - we just do -lcrypt, which can be provided by different libraries depending on system)?

solardiz commented 6 months ago

I can't reproduce this on cfarm29 (Raptor Blackbird ppc64le POWER9 Debian 12.5 bookworm 6.1.0-21-powerpc64le) running:

while :; do OMP_NUM_THREADS=4 ../run/john --test-full=0 --format=crypt || break; done
solar@cfarm29:~/john/src$ ldd ../run/john
        linux-vdso64.so.1 (0x00007fff94c80000)
        libcrypto.so.3 => /lib/powerpc64le-linux-gnu/libcrypto.so.3 (0x00007fff92800000)
        libgmp.so.10 => /lib/powerpc64le-linux-gnu/libgmp.so.10 (0x00007fff94b80000)
        libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00007fff926d0000)
        libz.so.1 => /lib/powerpc64le-linux-gnu/libz.so.1 (0x00007fff94b40000)
        libcrypt.so.1 => /lib/powerpc64le-linux-gnu/libcrypt.so.1 (0x00007fff94ae0000)
        libbz2.so.1.0 => /lib/powerpc64le-linux-gnu/libbz2.so.1.0 (0x00007fff92dc0000)
        libgomp.so.1 => /lib/powerpc64le-linux-gnu/libgomp.so.1 (0x00007fff92650000)
        libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00007fff92200000)
        /lib64/ld64.so.2 (0x00007fff94c90000)
solar@cfarm29:~/john/src$ dpkg -S /lib/powerpc64le-linux-gnu/libcrypt.so.1
libcrypt1:ppc64el: /lib/powerpc64le-linux-gnu/libcrypt.so.1
solar@cfarm29:~/john/src$ dpkg -s libcrypt1
Package: libcrypt1
Protected: yes
Status: install ok installed
Priority: optional
Section: libs
Installed-Size: 289
Maintainer: Marco d'Itri <md@linux.it>
Architecture: ppc64el
Multi-Arch: same
Source: libxcrypt
Version: 1:4.4.33-2
Replaces: libc6 (<< 2.29-4)
Depends: libc6 (>= 2.36)
Conflicts: libpam0g (<< 1.4.0-10)
Description: libcrypt shared library
 libxcrypt is a modern library for one-way hashing of passwords.
 It supports DES, MD5, NTHASH, SUNMD5, SHA-2-256, SHA-2-512, and
 bcrypt-based password hashes
 It provides the traditional Unix 'crypt' and 'crypt_r' interfaces,
 as well as a set of extended interfaces like 'crypt_gensalt'.
Important: yes
solardiz commented 6 months ago

I can't reproduce this on cfarm29 (Raptor Blackbird ppc64le POWER9 Debian 12.5 bookworm 6.1.0-21-powerpc64le)

On the same system, I also couldn't reproduce this with:

for n in `seq 0 99`; do time OMP_NUM_THREADS=4 ../run/john --test-full=0 -form=cpu || break; done

which took over 10 hours.

solardiz commented 6 months ago

Still couldn't reproduce with 32, 33, 333 threads (this system has 32 hardware threads). However, after a while I got this:

Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (cmp_all(64))

which could be a bug in our format, in OpenSSL, in the OpenMP implementation, in the kernel, or below.

solardiz commented 6 months ago

The gpg issue is quite reproducible:

solar@cfarm29:~/john/src$ for n in `seq 0 99`; do ../run/john --test-full=0 -form=gpg -v=5 || break; done
initUnicode(UNICODE, RAW/RAW)
RAW -> RAW -> RAW
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 64.405 ms, 496 c/s +
OMP scale 2: 64 crypts (1x64) in 115.352 ms, 554 c/s +
Autotune found best speed at OMP scale of 2
PASS
initUnicode(UNICODE, RAW/RAW)
RAW -> RAW -> RAW
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 62.556 ms, 511 c/s +
OMP scale 2: 64 crypts (1x64) in 114.906 ms, 556 c/s +
Autotune found best speed at OMP scale of 2
PASS
initUnicode(UNICODE, RAW/RAW)
RAW -> RAW -> RAW
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 62.355 ms, 513 c/s +
OMP scale 2: 64 crypts (1x64) in 114.695 ms, 557 c/s +
Autotune found best speed at OMP scale of 2
PASS
initUnicode(UNICODE, RAW/RAW)
RAW -> RAW -> RAW
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 65.688 ms, 487 c/s +
OMP scale 2: 64 crypts (1x64) in 115.867 ms, 552 c/s +
Autotune found best speed at OMP scale of 2
FAILED (cmp_all(64) $gpg$*1*650*2048*72624bb7243579c0c77cf1e64565251e0ac9d0dcb2f4b98fa54e1678ee4234409efe464a117b21aff978907cfbf19eb2547d44e3a2e6f7db5bfceb4af2391992f30ff55a292d0c011f05c3ab27a1a3fde1a9fd1fbf)

and on another occasion:

Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 62.383 ms, 512 c/s +
OMP scale 2: 64 crypts (1x64) in 115.573 ms, 553 c/s +
Autotune found best speed at OMP scale of 2
FAILED (get_key(47) (case) 80808080�67890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901 openwall)

It usually fails on cmp_all(64), but one was get_key(47) as above.

On this system, it auto-tunes to OMP_SCALE 2, which isn't something we commonly test on x86_64 (where we have pre-tuned OMP_SCALE of 1 for this format).

solardiz commented 6 months ago

On this system, it auto-tunes to OMP_SCALE 2, which isn't something we commonly test on x86_64 (where we have pre-tuned OMP_SCALE of 1 for this format).

Looks like a red herring. After a little while, I also got it to fail here with -tune=1:

Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors
FAILED (cmp_all(32) $gpg$*1*650*2048*72624bb7243579c0c77cf1e64565251e0ac9d0dcb2f4b98fa54e1678ee4234409efe464a117b21aff978907cfbf19eb2547d44e3a2e6f7db5bfceb4af2391992f30ff55a292d0c011f05c3ab27a1a3fde1a9fd1fbf)
solardiz commented 6 months ago

I took further comments on the gpg issue to #3543 as we already had that issue opened and its cause is quite likely separate from what Claudio observed with the crypt format.

solardiz commented 5 months ago

If anyone wants to proceed to debug this further, a next step could be to identify and take Ubuntu's exact libxcrypt version and binary package where the issue was triggered and experiment with that on cfarm29. I think the system is similar enough that the same libxcrypt binary could be loaded there via LD_LIBRARY_PATH. @claudioandre-br maybe you?