openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
9.98k stars 2.06k forks source link

CL_OUT_OF_RESOURCES when copying results back #4343

Open solardiz opened 3 years ago

solardiz commented 3 years ago

As seen in https://github.com/openwall/john/issues/4295#issuecomment-694364334, lotus5-opencl and keepass-opencl were mysteriously failing on the ancient GTX 570. Edit: later the same was also observed for EncFS-opencl, gpg-opencl, keychain-opencl, pgpdisk-opencl, pgpsda-opencl, ZIP-opencl.

I seem to have figured this out for lotus5-opencl: we were sometimes invoking the kernel on more work items than we were copying inputs for, such as when called with *pcount 1, which got "upgraded" to 64 for the kernel invocation but not for the inputs copying. Apparently, when the kernel ran on some uninitialized inputs it sometimes failed (seemed to depend on prior content of GPU memory) and the failure was visible when trying to copy its results back (even the fewer results for which inputs were provided). The below patch seems to fix this for me:

+++ b/src/opencl_lotus5_fmt_plug.c
@@ -234,16 +234,16 @@ static int crypt_all(int *pcount, struct db_salt *salt)
        size_t mem_cpy_sz;
        size_t N, *M;

-       mem_cpy_sz = count * KEY_SIZE_IN_BYTES;
+       M = local_work_size ? &local_work_size : NULL;
+       N = GET_NEXT_MULTIPLE(count, local_work_size);
+
+       mem_cpy_sz = N * KEY_SIZE_IN_BYTES;
        BENCH_CLERROR(clEnqueueWriteBuffer(queue[gpu_id],
                                            cl_tx_keys, CL_FALSE, 0,
                                            mem_cpy_sz, saved_key,
                                            0, NULL, multi_profilingEvent[0]),
                                            "Failed to write buffer cl_tx_keys.");

-       M = local_work_size ? &local_work_size : NULL;
-       N = GET_NEXT_MULTIPLE(count, local_work_size);
-
        BENCH_CLERROR(clEnqueueNDRangeKernel(queue[gpu_id],
                                              crypt_kernel, 1,
                                              NULL, &N, M, 0, NULL, multi_profilingEvent[1]),

While the failure for keepass-opencl looks similar, the code is very different and I don't know whether the underlying issue is similar or also very different. I'd appreciate help on that.

magnumripper commented 3 years ago

The Keepass code always copies the full allocated buffer size. Also, it includes safety for any over-sized (eg. uninitialized) length field. BTW don't forget #4313

solardiz commented 3 years ago

I think #4313 is different (I wouldn't expect it to manifest itself when running with explicit LWS=64 GWS=64, where this one does), and is a bug we should fix globally - not just for keepass-opencl.

I think I'll get the lotus5-opencl fix in, and keep this issue open for now for trying to figure out keepass-opencl.

solardiz commented 3 years ago

I've just tested the lotus5-opencl fix on super's GPUs and pushed it right into bleeding-jumbo.

solardiz commented 3 years ago

Also tested the patched lotus5-opencl in an ASan build on CPU and MIC. No issues.

magnumripper commented 3 years ago

56e9bcdea has small changes, not sure if they matter

solardiz commented 3 years ago

FWIW, the keepass-opencl problem here still occurs just the same after 8bbd90f49ca6b18179112b3fa923192e198b4894 (and all prior commits).

Device 1: GeForce GTX 570
Benchmarking: sha1crypt-opencl, (NetBSD) [PBKDF1-SHA1 OpenCL]... LWS=32 GWS=3840 (120 blocks) DONE
Speed for cost 1 (iteration count) of 64000 and 40000
Raw:    5907 c/s real, 5885 c/s virtual

Benchmarking: KeePass-opencl [SHA256 AES/Twofish/ChaCha OpenCL]... 0: OpenCL CL_OUT_OF_RESOURCES (-5) error in opencl_keepass_fmt_plug.c:279 - Copy result back
FAILED (cmp_all(-1))
claudioandre-br commented 3 years ago

Can you enable magnun's debug nvidia-smi call?

solardiz commented 3 years ago

@claudioandre-br I'd need to run stress-test for that, and for now I am simply running a regular test across all formats. I check nvidia-smi manually once in a while, and there's no leak seen anymore.

New instances of similar misbehavior:

Benchmarking: EncFS-opencl [PBKDF2-SHA1 AES OpenCL]... 0: OpenCL CL_OUT_OF_RESOURCES (-5) error in opencl_encfs_fmt_plug.c:316 - Copy result back
FAILED (cmp_all(-1))
Benchmarking: gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1/SHA2 OpenCL]... (8xOMP) 0: OpenCL CL_OUT_OF_RESOURCES (-5) error in opencl_gpg_fmt_plug.c:319 - Copy result back
FAILED (cmp_all(-1))
Benchmarking: keychain-opencl, Mac OS X Keychain [PBKDF2-SHA1 3DES OpenCL]... 0: OpenCL CL_OUT_OF_RESOURCES (-5) error in opencl_keychain_fmt_plug.c:252 - Copy result back
FAILED (cmp_all(-1))

Previously, my test simply didn't get this far because of the memory leaks.

solardiz commented 3 years ago

The four failing formats - keepass-opencl, EncFS-opencl, gpg-opencl, and keychain-opencl - also fail in the exact same way when tested individually and with explicit LWS=64 GWS=64.

solardiz commented 3 years ago

Here's an idea: these formats copy the entire input buffer to the device, and the entire output buffer from the device. This means they're trying to copy back results that were not produced by the device when the actual *pcount and GWS are lower than buffer allocation. Perhaps that's our bug? Maybe reading of uninitialized results may fail like this, rather than merely produce garbage.

claudioandre-br commented 3 years ago

[...]. Perhaps that's our bug?

Using LWS=64 GWS=64? Odd.

solardiz commented 3 years ago

Here's another guess: maybe the kernel itself occasionally fails to produce results (for inputs beyond *pcount but below GWS, or for all) when some inputs are invalid (copied from the host, but using uninitialized data on the host).

Edit: not confirmed that the data on host is uninitialized - there's a properly-looking memset() in clear_keys() in keepass-opencl. I also see no issue in the kernel itself handling a zero length password, as would result from that memset().

claudioandre-br commented 3 years ago

I like the uninitialized. Salt related stuff could be invalid, for example.

solardiz commented 3 years ago

I like the uninitialized. Salt related stuff could be invalid, for example.

I looked into that. keepass_get_salt() starts by memset'ing the whole thing. So it doesn't look like it can be partially uninitialized. Yet maybe there's something wrong in there causing the kernel to fail on this GPU but not on most others.

solardiz commented 3 years ago

3 more of these, for a total of 7 unfixed as of now (with lotus5-opencl having been fixed), on the same GTX 570 (seemingly/hopefully not reproducible anywhere else):

Benchmarking: pgpdisk-opencl [SHA1 AES/TwoFish/CAST OpenCL]... 0: OpenCL CL_OUT_OF_RESOURCES (-5) error in opencl_pgpdisk_fmt_plug.c:290 - Copy result back
FAILED (cmp_all(-1))
Benchmarking: pgpsda-opencl [SHA1 CAST OpenCL]... 0: OpenCL CL_OUT_OF_RESOURCES (-5) error in opencl_pgpsda_fmt_plug.c:244 - Copy result back
FAILED (cmp_all(-1))
Benchmarking: ZIP-opencl, WinZip [PBKDF2-SHA1 OpenCL]... 0: OpenCL CL_OUT_OF_RESOURCES (-5) error in opencl_zip_fmt_plug.c:378 - Copy result back
FAILED (cmp_all(-1))
magnumripper commented 3 years ago

@claudioandre-br I'd need to run stress-test for that, and for now I am simply running a regular test across all formats. I check nvidia-smi manually once in a while, and there's no leak seen anymore.

FWIW you can run --stress-test with many/all formats. It will only loop once it has tested all "selected" formats.

claudioandre-br commented 3 years ago

I run a stress test using well's Juniper GPU.

One format (at least) is not set for weak GPUs.

BTW: there is a john zombi process there right now. I can't kill it, but the machine is still ok.