openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.31k stars 2.1k forks source link

GPG intermittent self-test failure #3543

Closed claudioandre-br closed 5 months ago

claudioandre-br commented 5 years ago

Intermitent

. Inside my Circle

Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (3xOMP) FAILED (cmp_all(192))

The following issue is not GPG related. It is related to batch mode.

. GPG random test . Or, you can also use ../run/john alltests.in --max-run=30 --format=sha512crypt

# wget http://openwall.info/wiki/_media/john/test.gpg.tar.gz
# ../run/gpg2john *.asc > ~/file4

# $JtR ~/file4
Using default input encoding: UTF-8
Loaded 4 password hashes with 4 different salts (gpg, OpenPGP / GnuPG Secret Key [32/64])
Cost 1 (s2k-count) is 65536 for all loaded hashes
Cost 2 (hash algorithm [1:MD5 2:SHA1 3:RIPEMD160 8:SHA256 9:SHA384 10:SHA512 11:SHA224]) is 2 for all loaded hashes
Cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256 10:Twofish 11:Camellia128 12:Camellia192 13:Camellia256]) is 3 for all loaded hashes
Will run 2 OpenMP threads
Proceeding with single, rules:Wordlist
recovery.c:424:28: runtime error: 1.63246e+18 is outside the range of representable values of type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior recovery.c:424:28 in 
Press 'q' or Ctrl-C to abort, almost any other key for status
qwerty2          (Brandon User)
alamakota        (Original)
normal           (Łąśóęśżźć Normal)
Warning: Only 4 candidates buffered for the current salt, minimum 8
needed for performance.
Almost done: Processing the remaining buffered candidate passwords, if any
Warning: Only 1 candidates buffered for the current salt, minimum 8
needed for performance.
Proceeding with wordlist:../run/password.lst, rules:Wordlist
qwerty           (Random User)
4g 0:00:00:07 DONE 2/3 (2018-12-20 11:31) 0.5012g/s 8251p/s 8252c/s 8252C/s 123456..john
Use the "--show" option to display all of the cracked passwords reliably
Session completed

=================================================================
==80==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1024 byte(s) in 1 object(s) allocated from:
    #0 0x4ed437 in __interceptor_malloc (/cwd/bin/bleeding/run/john+0x4ed437)
    #1 0xff5fa3 in mem_alloc /cwd/src/memory.c:76:14
    #2 0xf6fd3f in crk_init /cwd/src/cracker.c:218:27
    #3 0x102bb31 in single_init /cwd/src/single.c:407:2
    #4 0x102a9bc in do_single_crack /cwd/src/single.c:839:2
    #5 0xf46fdd in do_single_pass /cwd/src/batch.c:25:2
    #6 0xf46d38 in do_batch_crack /cwd/src/batch.c:51:3
    #7 0xfa11ef in john_run /cwd/src/john.c:1875:4
    #8 0xf9f270 in main /cwd/src/john.c:2102:2
    #9 0x7f6405c7409a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

SUMMARY: AddressSanitizer: 1024 byte(s) leaked in 1 allocation(s).
claudioandre-br commented 5 years ago

The ticket contains two unrelated issues.

If the leak is on purpose, just close the ticket.

magnumripper commented 5 years ago

Leak fixed now. What about the self-test fail?

claudioandre-br commented 5 years ago

What about the self-test fail?

It is intermittent. Happens once in a while.

magnumripper commented 5 years ago

Then let's check this out

kholia commented 5 years ago

These failures have been plaguing the GPG format for a long time it seems. I was never able to get to the bottom of the issue earlier.

Earlier, I was looking for the problem in the gpg-opencl format (when it had fall-back on CPU code functionality in it).

It now seems that the problem exists within the base CPU code itself? This makes sense to me and would explain those weird gpg-opencl failures earlier.

magnumripper commented 5 years ago

I bet something in OpenSSL is not thread safe. Have we ever seen problems with a non-omp build?

kholia commented 5 years ago

I cannot reproduce the GPG format failures with the following command,

$ while true; do OMP_NUM_THREADS=3 ../run/john --format=gpg --test; sleep 1; done

I am running Ubuntu 18.04.1 64-bit here.

magnumripper commented 5 years ago

@claudioandre-br what version of OpenSSL was that?

claudioandre-br commented 5 years ago
Crypto library: OpenSSL
OpenSSL library version: 01000208f
OpenSSL 1.0.2h  3 May 2016
kholia commented 5 years ago

@claudioandre-br How do I reproduce this problem locally? Are you able to reproduce the problem on Ubuntu 18.0.1 LTS or something similar?

magnumripper commented 5 years ago

Try --stress-test=0. How often does it fail? Once in 10? Once in 100? Using how many threads?

kholia commented 5 years ago

I am running ../run/john --format=gpg --stress-test=0 on Ubuntu 18.04.1 LTS which has OpenSSL 1.1.0g-2ubuntu4.3.

...
#529 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS
#530 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS
#531 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS

I see no failures at all. I tried OMP_NUM_THREADS=3 but no failures again.

@claudioandre-br Can you please give some pointers on how to reproduce this bug?

solardiz commented 5 months ago

Looks like this (or similar) is still an issue, gpg format often failing on Linux cfarm29 6.1.0-21-powerpc64le #1 SMP Debian 6.1.90-1 (2024-05-03) ppc64le GNU/Linux: https://github.com/openwall/john/issues/5491#issuecomment-2147418095

solardiz commented 5 months ago

On the above system, I am unable to trigger the issue with --stress-test=0, but I am with a shell while loop around a --test-full=0. This suggests that maybe the problem is exposed/hidden by variations in memory layout (ASLR).

The failure is usually on cmp_all of the full size, but on one occasion it was on get_key(47) (with mkpc 64), which suggests we have heap memory corruption.

solardiz commented 5 months ago

I could also get it to fail in an --enable-asan build, without triggering ASan. This suggests we may have memory corruption via a system library (a bug in there, or in our usage of the library). Or a race condition in our code, but the one failure on get_key suggests otherwise.

solar@cfarm29:~/john/run$ while :; do ../run/john-asan --test-full=0 --format=gpg -tune=2 || break; done
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (cmp_all(64))
solardiz commented 5 months ago

This hack didn't catch anything:

+++ b/src/gpg_common_plug.c
@@ -1220,6 +1220,10 @@ int gpg_common_check(unsigned char *keydata, int ks)
        // other things (like mpz integer creation) are needed, we know that
        // our sizes will not overflow.
        out = mem_alloc((gpg_common_cur_salt->datalen) + 0x10000);
+       unsigned char *out_last = &out[gpg_common_cur_salt->datalen + 0x10000 - 1];
+       *out_last = 'c';
+#undef MEM_FREE
+#define MEM_FREE(out) { assert(*out_last == 'c'); free(out); out = NULL; }
        // Quick Hack
        if (!gpg_common_cur_salt->symmetric_mode)
                memcpy(ivec, gpg_common_cur_salt->iv, gpg_common_blockSize(gpg_common_cur_salt->cipher_algorithm));
solardiz commented 5 months ago

In fact, even this works:

+++ b/src/gpg_common_plug.c
@@ -1220,6 +1220,10 @@ int gpg_common_check(unsigned char *keydata, int ks)
        // other things (like mpz integer creation) are needed, we know that
        // our sizes will not overflow.
        out = mem_alloc((gpg_common_cur_salt->datalen) + 0x10000);
+       unsigned char *out_last = &out[gpg_common_cur_salt->datalen];
+       *out_last = 'c';
+#undef MEM_FREE
+#define MEM_FREE(out) { assert(*out_last == 'c'); free(out); out = NULL; }

So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).

solardiz commented 5 months ago

So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).

This and many other over-allocations were added by JimF in 714779815a9c79d348c28c4a6be86eabd6dfe694 in 2016, with commit title "gpg: bugs found in -test-full=0 using ASAN".

solardiz commented 5 months ago

Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (cmp_all(64))

This triggers the same for -cost=0,-8,-7 vs. -cost=0,8,7, so the problem doesn't look specific to (usage of) any one S2K, hash, nor cipher.

Edit: perhaps I am wrong - I actually don't know if -cost affects our self-tests, or only later benchmark/usage. If so, that wasn't a valid test.

solardiz commented 5 months ago

perhaps I am wrong - I actually don't know if -cost affects our self-tests, or only later benchmark/usage. If so, that wasn't a valid test.

So I went to patch the test vectors in the source. With only the symmetric tests left (last 7 test vectors), the testing is really quick and it didn't fail for me. With the opposite change (removing these symmetric tests), after a while I got:

Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (get_key(57))

which is the second time I see this format fail on get_key. The build was with ASan, which didn't trigger. So this confirms memory corruption via library.

solardiz commented 5 months ago

The two gpg --homedir . --s2k-cipher-algo 3des --simple-sk-checksum --gen-key test vectors appear to be both required and sufficient to reproduce the issue.

solardiz commented 5 months ago

So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).

This and many other over-allocations were added by JimF in 7147798 in 2016, with commit title "gpg: bugs found in -test-full=0 using ASAN".

OK, this over-allocation was in fact needed - we have major over-reads (being reads, they were not caught by my canary). About the smallest that works is 0x2700 extra bytes. I wondered if these over-read of uninitialized memory could result in the cmp_all failure mode, but initializing this memory to 0 didn't appear to make a difference.

solardiz commented 5 months ago

I didn't look into this further yet, but just to write down my understanding from earlier today: when trying usually wrong passwords, we end up decrypting what isn't a valid key. Yet it has embedded length fields in it. Those may end up indicating bignums of up to 64 kbit each. This is why the over-reads. Then we pass those unusually huge bignums into OpenSSL routines, thereby torturing them (and wasting lots of CPU cycles, too, which I guess we'd rather avoid by detecting the over-reads instead). We also ask BN_bin2bn to do its own memory allocation for the output bignum - well, I'm not sure it's able to calculate its size correctly for an unusual input like this - will need to review/test OpenSSL code for that. This could very well be the issue.

solardiz commented 5 months ago

we pass those unusually huge bignums into OpenSSL routines, thereby torturing them (and wasting lots of CPU cycles, too, which I guess we'd rather avoid by detecting the over-reads instead). We also ask BN_bin2bn to do its own memory allocation for the output bignum - well, I'm not sure it's able to calculate its size correctly for an unusual input like this - will need to review/test OpenSSL code for that.

I reviewed current OpenSSL code - looks correct even for such inputs.

Our issue appears to be different. We have larger p and q fields in the salt struct to account for the up to 64kbit values, but our d field is tiny, yet we also write an up to 64kbit bignum in there (which we actually just skip).

I wonder why ASan does not trigger. The struct ends in unsigned char data[1]; which is idiomatic for a dynamically-allocated struct size. Are these maybe excluded from ASan protection?

Anyway, patching the code to skip this field without writing it to d makes the issue go away in my testing.

solardiz commented 5 months ago

I wonder why ASan does not trigger. The struct ends in unsigned char data[1]; which is idiomatic for a dynamically-allocated struct size. Are these maybe excluded from ASan protection?

That's not how ASan works, so was probably a wrong guess. Another probably wrong guess is the overwrite is somehow contained to the struct (which is valid behavior in C), but results in us making incorrect calls into OpenSSL later (because of the overwritten salt data, which maybe wouldn't have passed our valid). I do not have a really good idea as to what was going on, but I think the issue is now fixed - at least I cannot reproduce it anymore after my fixes.