kimci86 / bkcrack

Crack legacy zip encryption with Biham and Kocher's known plaintext attack.
zlib License
1.69k stars 163 forks source link

Use of -p without -P #41

Open magnumripper opened 3 years ago

magnumripper commented 3 years ago

What is expected from file when using -p file without -P plain.zip?

The examples mention either -C encrypted.zip -c cipher -P plain.zip -p plain or -c cipherfile -p plainfile. I tried using a mix of them, as in -C encrypted.zip -c cipher -p plainfile - where (in my mind but perhaps not in bkcrack's) cipher was a file within encrypted.zip while plainfile was a plain file in my pwd simply containing the plaintext as-is - and that was accepted but it didn't work at all.

If that is supposed to work, it doesn't seem to, might be a bug.

If it's not supposed to work at all without -P, it should bail with some informative error. I got confusing errors such as Data error: plaintext offset is too large. or Data error: ciphertext is smaller than plaintext.. With some combination of options (I think adding -t to the mix) I got it to run but it could not find the keys (these were all test runs with staged data - it should have).

Perhaps it is supposed to work, but only if plainfile is extracted (eg. with dd) from an unencrypted archive? I did try to add my plainfile to a dummy, unencrypted zip file and then use -P dummy.zip -p plainfile and that did work just fine. If this is it, maybe just document it better.

Example of what not worked:

$ echo "Test data alpha bravo charlie echo delta fox golf hotel" > test.txt
$ rm -f test.zip && zip -e test.zip test.txt
Enter password:   (I entered 'magnum' here)
Verify password:
  adding: test.txt (deflated 2%)
$ ./bkcrack -C test.zip -c test.txt -p test.txt
bkcrack 1.3.0 - 2021-08-16
Data error: plaintext offset is too large.

Here's what worked fine:

$ echo "Test data alpha bravo charlie echo delta fox golf hotel india juliet" > test.txt
$ rm -f test.zip && zip -e test.zip test.txt
Enter password:
Verify password:
  adding: test.txt (deflated 10%)
$ rm -f plain.zip && zip plain.zip test.txt
  adding: test.txt (deflated 10%)
$ ./bkcrack -C test.zip -c test.txt -P plain.zip -p test.txt
bkcrack 1.3.0 - 2021-08-16
[19:49:10] Z reduction using 54 bytes of known plaintext
100.0 % (54 / 54)
[19:49:10] Attack on 150507 Z values at index 7
Keys: a5025690 1257b418 cee8bad2
4.7 % (7030 / 150507)
[19:49:17] Keys
a5025690 1257b418 cee8bad2

I could use those keys to crack the actual password eg. with hashcat.

magnumripper commented 3 years ago

I see now my initial attempt does work, provided the encrypted zip is stored, not deflated:

$ echo "Test data alpha bravo charlie echo delta fox golf hotel" > test.txt
$ rm -f test.zip && zip -0 -e test.zip test.txt
Enter password:
Verify password:
  adding: test.txt (deflated 2%)
$ ./bkcrack -C test.zip -c test.txt -p test.txt
bkcrack 1.3.0 - 2021-08-16
[20:26:39] Z reduction using 61 bytes of known plaintext
100.0 % (61 / 61)
[20:26:39] Attack on 134339 Z values at index 7
Keys: a5025690 1257b418 cee8bad2
30.3 % (40665 / 134339)
[20:27:19] Keys
a5025690 1257b418 cee8bad2

So I guess there's no bug, just a confused user. Maybe some clarifications in the documentation: Apparently the "plaintext" must be as-is in the attacked archive, so we have to match deflated-or-not and so on.

magnumripper commented 3 years ago

This however makes me wonder when the -x and -o options are usable at all... they're not of much use unless the attacked file is stored, right?

kimci86 commented 3 years ago

Hello, Thank you for reporting this with great details.

You understood correctly, plaintext must be (a part of) the encrypted data just before encryption, which means it might have to be compressed. It often confuses people. I need to document this better and have more explicit error messages.

Also you are right, -x and -o options are probably never useful when compression is used because a large chunk of uncompressed data is required to get the right compressed data. There could be a warning message when they are used on compressed data.

magnumripper commented 3 years ago

So I assume (only now) this also applies to the -t size option: If used with compressed data, we're talking compressed size, right?

I think I understand it all now but there's some documentation needed for making it clear to a newbie.

kimci86 commented 3 years ago

Yes, when compression is used, -t refers to compressed data. This can be useful for compressed data because, depending on the compression settings, compressed data can start the same but diverge at some point. I should document this too.

magnumripper commented 3 years ago

-t refers to compressed data. This can be useful for compressed data because, depending on the compression settings, compressed data can start the same but diverge at some point.

Oh, right, that's a good point.

This all is obviously down to seeing things in their right "layers" just like with networking: bkcrack only attacks the archive data so if the attacked file in the attacked archive is deflated with parameters so and so, everything is in terms of deflated data with such parameters. Not sure how to put it well in a usage blob 😵

That being said, I guess theoretically there could be code added for user saying "plaintext is literally alpha bravo charlie delta" (or my original try of -p plainfile) even though the attacked archive is deflated - at least as long as there is no offset involved. We'd just have to automagically deflate the given plaintext (using settings seen in the -C encrypted.zip -c cipher) and then use that.

That would probably add even more confusion for a newbie though 😆 🤣

jpatokal commented 3 years ago

This discussion is super informative, but I'm still confused about how this is supposed to work if the target file is deflated and the known plaintext is too short to compress.

I have an encrypted ZIP that contains deflated PDFs. It's likely that each PDF (when expanded) starts with the following 15 bytes of plaintext, which I've extracted into a file 15 bytes long:

$ xxd pdf-head.dat 
00000000: 2550 4446 2d31 2e37 0d0a 25b5 b5b5 b5    %PDF-1.7..%....
$ ls -l pdf-head.dat 
-rw-r--r--  1 jpatokal  staff  15 20 Aug 21:31 pdf-head.dat

Per the discussion above, I need to deflate this to get it to match the bytes in the target, but a string of 15 chars is too short to allow deflating:

$ zip plain.zip pdf-head.dat 
  adding: pdf-head.dat (stored 0%)

Is there a way around this? For example, passing in a longer deflated ZIP of a similar PDF and guesstimating how many bytes would be the same?

kimci86 commented 3 years ago

When deflate compression can compress, it usually uses Huffman coding (and also LZ algorithm) with a Huffman tree built from a large block of data. A compressed block starts with a compressed representation of the Huffman tree. Then there is the compressed data. So it is hard to get correct plaintext for compressed data when as few bytes are known.

You can try to compress similar PDF files and hope the Huffman tree and the first few compressed bytes will be the same. As PDF files already contain compressed data, the entropy is high and maybe the Huffman tree will be the same. I do not known how likely it is to work. Probably not much.