Closed claudioandre-br closed 5 months ago
The ticket contains two unrelated issues.
If the leak is on purpose, just close the ticket.
Leak fixed now. What about the self-test fail?
What about the self-test fail?
It is intermittent. Happens once in a while.
Then let's check this out
These failures have been plaguing the GPG format for a long time it seems. I was never able to get to the bottom of the issue earlier.
Earlier, I was looking for the problem in the gpg-opencl
format (when it had fall-back on CPU code functionality in it).
It now seems that the problem exists within the base CPU code itself? This makes sense to me and would explain those weird gpg-opencl
failures earlier.
I bet something in OpenSSL is not thread safe. Have we ever seen problems with a non-omp build?
I cannot reproduce the GPG format failures with the following command,
$ while true; do OMP_NUM_THREADS=3 ../run/john --format=gpg --test; sleep 1; done
I am running Ubuntu 18.04.1 64-bit here.
@claudioandre-br what version of OpenSSL was that?
Crypto library: OpenSSL
OpenSSL library version: 01000208f
OpenSSL 1.0.2h 3 May 2016
@claudioandre-br How do I reproduce this problem locally? Are you able to reproduce the problem on Ubuntu 18.0.1 LTS or something similar?
Try --stress-test=0
. How often does it fail? Once in 10? Once in 100? Using how many threads?
I am running ../run/john --format=gpg --stress-test=0
on Ubuntu 18.04.1 LTS which has OpenSSL 1.1.0g-2ubuntu4.3
.
...
#529 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS
#530 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS
#531 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS
I see no failures at all. I tried OMP_NUM_THREADS=3
but no failures again.
@claudioandre-br Can you please give some pointers on how to reproduce this bug?
Looks like this (or similar) is still an issue, gpg
format often failing on Linux cfarm29 6.1.0-21-powerpc64le #1 SMP Debian 6.1.90-1 (2024-05-03) ppc64le GNU/Linux
: https://github.com/openwall/john/issues/5491#issuecomment-2147418095
On the above system, I am unable to trigger the issue with --stress-test=0
, but I am with a shell while
loop around a --test-full=0
. This suggests that maybe the problem is exposed/hidden by variations in memory layout (ASLR).
The failure is usually on cmp_all
of the full size, but on one occasion it was on get_key(47)
(with mkpc 64), which suggests we have heap memory corruption.
I could also get it to fail in an --enable-asan
build, without triggering ASan. This suggests we may have memory corruption via a system library (a bug in there, or in our usage of the library). Or a race condition in our code, but the one failure on get_key
suggests otherwise.
solar@cfarm29:~/john/run$ while :; do ../run/john-asan --test-full=0 --format=gpg -tune=2 || break; done
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (cmp_all(64))
This hack didn't catch anything:
+++ b/src/gpg_common_plug.c
@@ -1220,6 +1220,10 @@ int gpg_common_check(unsigned char *keydata, int ks)
// other things (like mpz integer creation) are needed, we know that
// our sizes will not overflow.
out = mem_alloc((gpg_common_cur_salt->datalen) + 0x10000);
+ unsigned char *out_last = &out[gpg_common_cur_salt->datalen + 0x10000 - 1];
+ *out_last = 'c';
+#undef MEM_FREE
+#define MEM_FREE(out) { assert(*out_last == 'c'); free(out); out = NULL; }
// Quick Hack
if (!gpg_common_cur_salt->symmetric_mode)
memcpy(ivec, gpg_common_cur_salt->iv, gpg_common_blockSize(gpg_common_cur_salt->cipher_algorithm));
In fact, even this works:
+++ b/src/gpg_common_plug.c
@@ -1220,6 +1220,10 @@ int gpg_common_check(unsigned char *keydata, int ks)
// other things (like mpz integer creation) are needed, we know that
// our sizes will not overflow.
out = mem_alloc((gpg_common_cur_salt->datalen) + 0x10000);
+ unsigned char *out_last = &out[gpg_common_cur_salt->datalen];
+ *out_last = 'c';
+#undef MEM_FREE
+#define MEM_FREE(out) { assert(*out_last == 'c'); free(out); out = NULL; }
So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).
So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).
This and many other over-allocations were added by JimF in 714779815a9c79d348c28c4a6be86eabd6dfe694 in 2016, with commit title "gpg: bugs found in -test-full=0 using ASAN".
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (cmp_all(64))
This triggers the same for -cost=0,-8,-7
vs. -cost=0,8,7
, so the problem doesn't look specific to (usage of) any one S2K, hash, nor cipher.
Edit: perhaps I am wrong - I actually don't know if -cost
affects our self-tests, or only later benchmark/usage. If so, that wasn't a valid test.
perhaps I am wrong - I actually don't know if
-cost
affects our self-tests, or only later benchmark/usage. If so, that wasn't a valid test.
So I went to patch the test vectors in the source. With only the symmetric tests left (last 7 test vectors), the testing is really quick and it didn't fail for me. With the opposite change (removing these symmetric tests), after a while I got:
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (get_key(57))
which is the second time I see this format fail on get_key
. The build was with ASan, which didn't trigger. So this confirms memory corruption via library.
The two gpg --homedir . --s2k-cipher-algo 3des --simple-sk-checksum --gen-key
test vectors appear to be both required and sufficient to reproduce the issue.
So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).
This and many other over-allocations were added by JimF in 7147798 in 2016, with commit title "gpg: bugs found in -test-full=0 using ASAN".
OK, this over-allocation was in fact needed - we have major over-reads (being reads, they were not caught by my canary). About the smallest that works is 0x2700 extra bytes. I wondered if these over-read of uninitialized memory could result in the cmp_all
failure mode, but initializing this memory to 0 didn't appear to make a difference.
I didn't look into this further yet, but just to write down my understanding from earlier today: when trying usually wrong passwords, we end up decrypting what isn't a valid key. Yet it has embedded length fields in it. Those may end up indicating bignums of up to 64 kbit each. This is why the over-reads. Then we pass those unusually huge bignums into OpenSSL routines, thereby torturing them (and wasting lots of CPU cycles, too, which I guess we'd rather avoid by detecting the over-reads instead). We also ask BN_bin2bn
to do its own memory allocation for the output bignum - well, I'm not sure it's able to calculate its size correctly for an unusual input like this - will need to review/test OpenSSL code for that. This could very well be the issue.
we pass those unusually huge bignums into OpenSSL routines, thereby torturing them (and wasting lots of CPU cycles, too, which I guess we'd rather avoid by detecting the over-reads instead). We also ask
BN_bin2bn
to do its own memory allocation for the output bignum - well, I'm not sure it's able to calculate its size correctly for an unusual input like this - will need to review/test OpenSSL code for that.
I reviewed current OpenSSL code - looks correct even for such inputs.
Our issue appears to be different. We have larger p
and q
fields in the salt struct to account for the up to 64kbit values, but our d
field is tiny, yet we also write an up to 64kbit bignum in there (which we actually just skip).
I wonder why ASan does not trigger. The struct ends in unsigned char data[1];
which is idiomatic for a dynamically-allocated struct size. Are these maybe excluded from ASan protection?
Anyway, patching the code to skip this field without writing it to d
makes the issue go away in my testing.
I wonder why ASan does not trigger. The struct ends in
unsigned char data[1];
which is idiomatic for a dynamically-allocated struct size. Are these maybe excluded from ASan protection?
That's not how ASan works, so was probably a wrong guess. Another probably wrong guess is the overwrite is somehow contained to the struct (which is valid behavior in C), but results in us making incorrect calls into OpenSSL later (because of the overwritten salt data, which maybe wouldn't have passed our valid
). I do not have a really good idea as to what was going on, but I think the issue is now fixed - at least I cannot reproduce it anymore after my fixes.
Intermitent
. Inside my Circle
The following issue is not GPG related. It is related to batch mode.
. GPG random test . Or, you can also use
../run/john alltests.in --max-run=30 --format=sha512crypt