Question: How to determine compression options used for ZipCrypto Deflate

fetzerms commented 3 years ago

Hi,

I have the following scenario:

One archive
Two encrypted .xml files, with Deflate
One non-encrypted file, with Deflate

Means:

All files are compressed, but not all files are encrypted.
Known partial plaintext could be used on the encrypted xml files

box# zipinfo test.zip
Archive:  test.zip
Zip file size: xxxxxxxxx bytes, number of entries: 3
-rwxa--     2.0 ntf xxxxxxxx b- defN xx-xxx-xx 12:00 some_unencrypted_file
-rw-a--     2.0 ntf xxxxxxxx B- defN xx-xxx-xx 12:00 encrypted_1.xml
-rw-a--     2.0 ntf      xxx T- defN xx-xxx-xx 12:00 encrypted_2.xml

My question is: As I don't know the exact compression options that were used on the archive, Is it feasible to brute force the exact options from the non-encrypted file? Afterwards I could compress the (guessed) partial plain text with the same options and execute the known plain text attack.

I tried to brute force the parameters using 7zip and then comparing resulting compressed sizes for some_unencrypted_file, but somehow this did not yield any results.

7z a -tzip result_zip.zip unencrypted_file -mpass=$pass -mfb=$mfb

Does anyone maybe have some pointers on how to properly brute force the parameters?

kimci86 commented 3 years ago

Hello,

I tried to brute force the parameters using 7zip and then comparing resulting compressed sizes for some_unencrypted_file, but somehow this did not yield any results.
7z a -tzip result_zip.zip unencrypted_file -mpass=$pass -mfb=$mfb

I would try various values for the level of compression: from -mx=1 to -mx=9. Also, you can try other programs than 7zip such as Info-Zip's zip or WinZip.

Afterwards I could compress the (guessed) partial plain text with the same options and execute the known plain text attack.

Because of how deflate compression works (Huffman coding), compressing a small piece of data will most probably produce different bytes than compressing the entire file. So I think this will not work.

However, I guess encrypted_2.xml is small, so maybe it is not easy to compress and the deflate compression used a non-compressed block, for which is simple to guess plaintext (offset uncompressed plaintext by 5 bytes). Could you share the compressed and uncompressed sizes to confirm that?

fetzerms commented 3 years ago

Hi, thanks for answering that fast.

I would try various values for the level of compression: from -mx=1 to -mx=9. Also, you can try other programs than 7zip such as Info-Zip's zip or WinZip.

Ah, yes. I forgot that. I'm currently looping through -mx0-9 in combination with the other two flags.

#!/bin/bash
target=$(unzip -v target_zip.exe | grep unencrypted_file | awk '{ print $3 }')
echo "Searching for params to get: $target"
for x in $(seq 2 9); do
    for pass in $(seq 1 15); do
        for power in $(seq 3 258); do
            7z a -tzip zipped.zip unencrypted_file -mx=$x -mpass=$pass -mfb=$power > /dev/null 
            current=$(unzip -v zipped.zip  | grep unencrypted_file | awk '{ print $3 }')
            if [ "$current" = "$target" ]; then
                echo "Works! x: $x ; pass: $pass ; power: $power"
            else
                echo "Not working x: $x ; pass: $pass ; power: $power ; target: $target ; current: $current"
            fi
            # abort, if compression is already better than target.
            if [[ "$current" < "$target" ]]; then
                continue 2
            fi
            rm zipped.zip
        done
    done
done

Because of how deflate compression works (Huffman encoding), compressing a small piece of data will most probably produce different bytes than compressing the entire file. So I think this will not work.

However, I guess encrypted_2.xml is small, so maybe it is not easy to compress and the deflate compression used a non-compressed block, for which is simple to guess plaintext (offset uncompressed plaintext by 5 bytes). Could you share the compressed and uncompressed sizes to confirm that?

Can you define what a small piece would be? I think I can maybe predict ~130 out of 500 characters of the xml file. If that maybe helps with guessing the compression options.

Actually the file seems to be compressed quite well. Probably thats the nature of xml (opening and closing with similar tag). Details for the xml that I am trying to predict the known plaintext for:


  offset of local header from start of archive:   108330310
                                                  (000000000674FD46h) bytes
  file system or operating system of origin:      NTFS
  version of encoding software:                   2.0
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           encrypted
  extended local header:                          no
  file last modified on (DOS date/time):          xxxx xxx x xx:xx:xx
  32-bit CRC value (hex):                         xxxxxxxx
  compressed size:                                356 bytes
  uncompressed size:                              956 bytes
  length of filename:                             19 characters
  length of extra field:                          0 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             text
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (20 hex):                arc

kimci86 commented 3 years ago

So encrypted_2.xml is indeed well compressed. It will not help.

Can you define what a small piece would be?

I wrote in another issue a small script to illustrate how compressed data starts differently when compressing prefixes of various sizes of the same string. See https://github.com/kimci86/bkcrack/issues/26#issuecomment-797099523 Running the script again, I see that compressing the 445 bytes example string or compressing the first 441 bytes of this string gives different starting bytes. So it is very chaotic.

It might be too hard to guess how compression behaves here. Maybe running an attack on the password with hashcat or john the ripper will be more successful.

fetzerms commented 3 years ago

Thanks for the explanation! In case I am able to get a plain copy of the encrypted xml file, I would still need to "guess" the compression options (I guess), right? But then I should be able to execute the known plaintext attack, or am I missing some piece?

Thanks for pointing out john and hashcat. I thought about using them, but as the technical stuff interests me more than the actual content of the file, I will try to proceed in the known-plaintext direction, if there is any hope :-)

kimci86 commented 3 years ago

In case I am able to get a plain copy of the encrypted xml file, I would still need to "guess" the compression options (I guess), right?

Yes, this is right. The only difference between the known plaintext and the ciphertext must be the encryption.

But then I should be able to execute the known plaintext attack

This is also right. :slightly_smiling_face:

kimci86 commented 2 years ago

I close this as I understand you have no more questions. Feel free to reopen or open a new issue otherwise.

kimci86 / bkcrack

Question: How to determine compression options used for ZipCrypto Deflate #43