openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.27k stars 2.1k forks source link

Huge ciphertext memory use #4221

Open magnumripper opened 4 years ago

magnumripper commented 4 years ago

FMT_HUGE_INPUT should be used with non-default fmt_source() so we don't end up with the full ciphertext kept in memory for no use. I can't remember if we took care of that some other way, so need to review.

solardiz commented 4 years ago

@magnumripper I don't fully understand FMT_HUGE_INPUT. I notice we have it set where it's probably not needed - e.g., on DiskCryptor formats, where the "non-hashes" are of a fixed size of a little over 4 KB. Not being familiar with this flag, I didn't remove it, but perhaps you should.

magnumripper commented 4 years ago

We're using FMT_HUGE_INPUT for any format potentially having ciphertexts longer than LINE_BUFFER_SIZE and when we added it we reduced the latter macro to the original 0x400.

This issue is about formats like zip, rar and 7z though. They can have truly huge ciphertexts, several gigabytes. We're never going to use the full db->(...)->source anyway (the pot entry will be truncated and a hash of the full ciphertext will then be appended, for a total way below LINE_BUFFER_SIZE) so it's very wasteful to store it in the db.

Actually, I can't recall all the details - maybe we already do things right.

solardiz commented 4 years ago

Maybe we shouldn't make LINE_BUFFER_SIZE as low as 0x400, but can allow e.g. 10x more than that, so that formats like DiskCryptor wouldn't need FMT_HUGE_INPUT and would store the full "non-hashes" (in this case, a little over 4 KB, and there would be very few of those)? Would this perhaps be more convenient?

(In fact, DiskCryptor in particular currently decrypts only the first 96 bytes, so 192 hex characters. But the Python script extracts 2 KiB just in case, and hashcat now requires exactly that size, so we'd better not change this.)

magnumripper commented 4 years ago

We could set it to anything we want, but why bump it? The only magic in the format is the very FMT_HUGE_INPUT flag, the rest is core stuff. I hate it when I tail john.pot and get a wall of hex scrolling by...

solardiz commented 4 years ago

I was thinking that for only a few non-hashes of a few KB each, it's convenient to be able to match them against john.pot lines manually if someone wants to - but from what you say, this isn't a universally shared preference.

magnumripper commented 4 years ago

I agree with your view as well but I find the current limit of 1024 a pretty balanced compromise: Longer hashes can usually still be matched visually/manually, looking at the first couple of hundred characters or so that are still there before it's truncated and a hash is appended.

As soon as we bump it to 4K we'll end up with someone wanting this for some other non-hash needing 8K - and soon we're back to the crazy size we had before FMT_HUGE_INPUT.