zip formats false positives

magnumripper commented 10 years ago

Zip formats (both CPU and GPU) are moved to broken/ because they emit far too many false positives to be usable.

As a first quick'n'dirty step we could add an interim hack that calls system("unzip -P ... -t ...") for testing once you pass the early rejection - it would not slow things down much. At least it would work fine with the CPU format.

kholia commented 10 years ago

The standard "unzip" command doesn't handle ZIP files which use AES encryption IIRC.

magnumripper commented 10 years ago

The standard "unzip" command doesn't handle ZIP files which use AES encryption IIRC.

Maybe 7z supports it? We could have some define like HAVE_7Z in Makefile enabling this.

magnumripper commented 10 years ago

OTOH 7z is LGPL so we could just as well nick some code from it which would be a lot better.

kholia commented 10 years ago

Isn't most of the "interesting" 7z code written in C++?

magnumripper commented 10 years ago

Yeah I hate that. I haven't looked at it, maybe it's an easy port. Or maybe we should have a look at clamav again? They did a good job porting unrar.

magnumripper commented 10 years ago

Moved zip formats back from broken/ (a warning is now printed about the FP), but issue remains.

magnumripper commented 10 years ago

http://imgur.com/a/PbN8H#1

:+1: :laughing:

jfoug commented 10 years ago

Rough, but does work. Not sure about -show, may be broken. If the .zip file can not be found, it still reverts back to NOT_EXACT mode. Also, assumes that 7z is installed, and works with same command line switches as mine ;) We would have to add that to the configure script, to setup the HAVE_7Z define. For this example, I simply force HAVE_7Z to be true.

diff --git a/src/zip_fmt_plug.c b/src/zip_fmt_plug.c
index 4be84d1..4f63684 100644
--- a/src/zip_fmt_plug.c
+++ b/src/zip_fmt_plug.c
@@ -33,6 +33,7 @@ john_register_one(&fmt_zip);
 #include "common.h"
 #include "formats.h"
 #include "johnswap.h"
+#include "memory.h"
 #include "pbkdf2_hmac_sha1.h"
 #ifdef _OPENMP
 #include <omp.h>
@@ -125,7 +126,7 @@ static int valid(char *ciphertext, struct fmt_main *self)
                goto error;
        if (!ishex(ptr))
                goto error;
-       if (!(ptr = strtok(NULL, "*")))
+       if (!(ptr = strtok(NULL, "*~")))
                goto error;
        if (!ishex(ptr))
                goto error;
@@ -136,6 +137,18 @@ error:
        return 0;
 }

+static char *prepare(char *split_flds[10], struct fmt_main *self) {
+       // see if field 6 is there, and if so, if we can find the file.  IF so, then append it to the end of the string and return.
+       char *ret;
+       if (!split_flds[6])
+               return split_flds[1];
+       if (strrchr(split_flds[1], '~'))
+               return split_flds[1];
+       ret = mem_alloc_tiny(strlen(split_flds[1])+strlen(split_flds[6])+2, 1);
+       sprintf(ret, "%s~%s", split_flds[1],split_flds[6]);
+       return ret;
+}
+
 static void *get_salt(char *ciphertext)
 {
        int i, strength, n;
@@ -261,9 +274,42 @@ static int cmp_one(void *binary, int index)
 {
        return cracked[index];
 }
+#define HAVE_7Z 1
+#if HAVE_7Z
+static int validate_7z(char *fname, int index) {
+       FILE *in = fopen(fname, "r");
+       char buf[1024];
+       int ret;
+
+       if (!in)
+               return 666;
+       fclose(in);
+       sprintf (buf, "7z t -p\"%s\" %s  > /dev/null 2>&1", saved_key[index], fname);
+       // perform the system call
+       ret = system(buf);
+       return ret;
+}
+#endif

 static int cmp_exact(char *source, int index)
 {
+#if HAVE_7Z
+       char *cp;
+       if (cracked[index]) {
+               cp = strrchr(source, '~');
+               if (cp) {
+                       int ret;
+                       ++cp;
+                       ret = validate_7z(cp, index);
+                       if (!ret)
+                               printf ("  !CORRECT password found!  ");
+                       else if (ret != 666) {
+                               printf ("False positive, return code %d, word [%s]\n", ret, saved_key[index]);
+                               return 0;
+                       }
+               }
+       }
+#endif
        return cracked[index];
 }

@@ -290,7 +336,7 @@ struct fmt_main fmt_zip = {
                init,
                fmt_default_done,
                fmt_default_reset,
-               fmt_default_prepare,
+               prepare,
                valid,
                fmt_default_split,
                fmt_default_binary,

magnumripper commented 10 years ago

Cool, I think we should do this.

What is this for? (edit: I see you are using it later)

-       if (!(ptr = strtok(NULL, "*")))
+       if (!(ptr = strtok(NULL, "*~")))

Also, does it work with tricky filenames, like ones with spaces, parens (if at all allowed) and so on?

magnumripper commented 10 years ago

Is there any better character to use than ~? It's sometimes used in files that had name crippled from FAT32. We can't use : though, nor any slash. Maybe a " quote? Or maybe we can keep using * by doing it a little differently?

jfoug commented 10 years ago

We can easily change simple stuff like the ~ I think I need to quote the file name in the system call. I was quoting the password already ,but missed that.

In the end, I really think we need to see just wtf is going on. It may be as simple as we AES the buffer, and pass the results to zlib just like we do for pkzip.

But this system("7z t -ppass file") method gets us at least to where we have the plumbing to do a better test.

jfoug commented 10 years ago

We probably should append filename with a longer string than a single char. That way, a file that has a ~ in it, or a $, or whatever single char we could choose, is not a problem. Also, we could choose 'any' character not in the file name, use it, and append that character as the last character of the string, telling us what we used (probably could not use any hex char.

The biggest thing I have concerns about, is that I did this in prepare, and we do not call that on .pot loads. We need to make SURE that things like dupe elim work (yes, there is no dupe elim now, but if we get exact, there WILL be), also that --show, etc also work.

jfoug commented 10 years ago

In the end, we should change the zip2john, so that a proper filename embedded hash line is produced. It may be that something closer to the older pkzip hash signature will be the way to go, possibly embedding a small compressed blob right in the signature.

jfoug commented 10 years ago

http://imgur.com/a/PbN8H#1

This is the 'real' information about .zip I think winzip (and infozip) may also have something like this, but I still think the pkware APPNOTE is the defacto standard.

http://www.pkware.com/documents/casestudies/APPNOTE.TXT

jfoug commented 10 years ago

7.2.5.2 A pseudo-code representation of the encryption process is as follows:

            Password = GetUserPassword()
            MasterSessionKey = DeriveKey(SHA1(Password)) 
            RD = CryptographicStrengthRandomData() 
            For Each File
               IV = CryptographicStrengthRandomData() 
               VData = CryptographicStrengthRandomData()
               VCRC32 = CRC32(VData)
               FileSessionKey = DeriveKey(SHA1(IV + RD) 
               ErdData = Encrypt(RD,MasterSessionKey,IV) 
               Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV)
            Done

4.3.6 Overall .ZIP file format:

      [local file header 1]
      [encryption header 1]
      [file data 1]
      [data descriptor 1]
      . 
      .

   4.3.7  Local file header:

      local file header signature     4 bytes  (0x04034b50)
      version needed to extract       2 bytes
      general purpose bit flag        2 bytes
      compression method              2 bytes
      last mod file time              2 bytes
      last mod file date              2 bytes
      crc-32                          4 bytes
      compressed size                 4 bytes
      uncompressed size               4 bytes
      file name length                2 bytes
      extra field length              2 bytes

      file name (variable size)
      extra field (variable size)

4.5.12 -Strong Encryption Header (0x0017):

        Value     Size     Description
        -----     ----     -----------
        0x0017    2 bytes  Tag for this "extra" block type
        TSize     2 bytes  Size of data that follows
        Format    2 bytes  Format definition for this record
        AlgID     2 bytes  Encryption algorithm identifier
        Bitlen    2 bytes  Bit length of encryption key
        Flags     2 bytes  Processing flags
        CertData  TSize-8  Certificate decryption extra field data
                           (refer to the explanation for CertData
                            in the section describing the 
                            Certificate Processing Method under 
                            the Strong Encryption Specification)

x

jfoug commented 10 years ago

Ok, we may be able to 100% validate these WITHOUT having to even do an AES or any unzipping.

This is from gladman's code, the decrypt function:

void fcrypt_decrypt(unsigned char data[], unsigned int data_len, fcrypt_ctx cx[1])
{
    hmac_sha1_data(data, data_len, cx->auth_ctx);
    encr_data(data, data_len, cx);
}

The data is the data read from the stream. The hmac_sha1_data (which is computing the data for a 10 byte authentication code (which is stored 'plain text' in the .zip file after the zip blob of data). From this algorithm, it appears that the hmac is computed after encryption of data. So we should simply be able to hmac over the zip blob (from our starting pbkdf2-sha1 results), and then compare to the 10 bytes stored in the file. This would be a 100% match, IF it works. I will keep looking, but in the 30 minutes I have dug into gladmans code, this appears to be HOW it works. The verifyer is still 'most' of the work (at least on small files), but now we should have a way to really get a 100% valid result (I HOPE).

jfoug commented 10 years ago

I tested by commenting out the encr_data(data, data_len, cx); part of the fcrypt_decrypt function, and tested the computed authentication code vs the stored one, and they match. So all we need to do is to figure out exactly what needs to be done with this hmac_sha1 logic, store information on the file blob (like we do in pkzip), and we can have a 100% no false positive zip format!

Good thing about this, is it requires NO additional library. The older pkzip format did NOT have any authentication method. We had to decrypt, inflate, crc32 and compare the crc32 computed to the crc32 stored. Here we do not have to decrypt or inflate or run a crc. We do have to run the hmac, but we only have to do that on the compressed/encrypted blob of data.

I should be able to nail this one shortly (depends on how much time I get to work on it).

I PLAN on a new format for this. We should NOT (IMHO) try to shoe horn this into the existing $zip$ format. That format should be 100% depricated and moved to unused. If we do not, then we will not be able to do this and turn off the FMT_NOT_EXACT bit.

jfoug commented 10 years ago

NOTE, we could even do the zip 'magic', if we aes decrypt, and inflate. So if we have a really HUGE file, that is a several MB pdf file, we can store the first bytes of the encrypted/compressed blob, decrypt/inflate and look for %PDF- signature.

In the end, I may jam this code INTO the pkzip_fmt_plug.c, or make a pkzip_plug.c that has all the file/param handing crap, since this is looking more and more like pkzip format, just with pbdf2-sha1 and hmac-sha1 instead of the built in pkzip mangler code.

jfoug commented 10 years ago

Ok, I now know how to get the AES decryption key, AND the hmac-sha1 authentication key (and still get the 2 byte verifier key). It takes only a few extra CPU cycles (simple mem move and 32 bytes of xor).

I just change this code (only sse2 shown)

--    pbkdf2_sha1_sse((const unsigned char **)pin, lens, saved_salt, SALT_LENGTH(mode), KEYING_ITERATIONS, pout, 2, 2 * KEY_LENGTH(mode));
++    pbkdf2_sha1_sse((const unsigned char **)pin, lens, saved_salt, SALT_LENGTH(mode), KEYING_ITERATIONS, pout, 2+2*KEY_LENGTH(mode), 0);

I then make the output buffer larger (was only 2 bytes per candidate, now I need 66 which is large enough for aes-256). Then the AES key is bytes 0 to KEY_LENGTH(mode) and hmac-sha1 is key_length(mode) to 2*key_length(mode) bytes and the 2 verifier bytes are the 2 bytes after this.

Now that I know how to get the keys, I should be able to get this working without having the FMT_NON_EXACT flag. I will initially get a version where I replace my call to 7z, and have everything INSIDE the hashes (i.e. tiny files). So it will have the salt, the file data, the 10 byte authentication block, and it will simply use the 2 byte verifier like it does today, but once found, do the hmac of the file data (seeding with the now produced hmac key), and compare to the authentication code. I hope to have that working in a day or 2

(and I will try to cut down on so many JohnRipper .git messages)

magnumripper commented 10 years ago

Cool. Remember there's samples at http://openwall.info/wiki/john/sample-non-hashes

magnumripper commented 10 years ago

and I will try to cut down on so many JohnRipper .git messages

No reason. There's always some degree of peer review here, even when you are faced with silence! I'm currently on a roadtrip but I check in a couple of times a day.

kholia commented 10 years ago

I love all these discussions, please keep them coming!

jfoug commented 10 years ago

I have a 'first version' change to zip2john. It will print out all the aes encrypted hashes right now. I think I will run through the file twice. First looking for 'best'. Best would be a file with a couple hundred bytes or less (smallest file, but at least 10-20 bytes larger than the encryption additions. If I do not find anything small enough, then find a 'magic' file. If I find nothing small enough and no 'magic' files, then I will have to take the smallest, and store offsets and the .zip name (like I do for pkzip).

But I REALLY like this format much more the the pkzip old crypt. Here, we have a simple hash, that we run over the data, and end up with a 10 byte checksum in the end, along with we have a 2 byte quick check after loading the key (which is currently the 'only' check we use). In pkzip old, there was encryption, but I also had to inflate the data fully (fortunately zlib does that somewhat easily). But here, unless I do the file header 'magic' detection, I do not need to inflate at all.

I am going to put a quick POC format together, that does only a full file decrypt with file inline. New tag will be $zip2$ I think $zip$ should be deprecated. It is pretty much unusable, since it has such a high tendency of dupe (1 out of 2^16 is pretty lousy). The new method should be 1 out of 2^96 false positive, which actually is pretty much fool proof. I guess we 'could' deflate, and if it decrypts, deflates and crc/size matches, along with the 2^96 bits of authentication, then we KNOW the password is 100% correct.

I have a little coding time, before Deanna comes to bed (watching law and order). So, I hope to hammer out a little POC code. I will email the 2 of you the zip_fmt file, and zip2john when I get something sort of working. I do not want to put it into git until the format hash is finalized. dropping a format and converting it into something else should not be done often, and I certainly do not want to do it multiple times while developing this.

Has anyone looked at other compressors (ace, 7z, etc) ? I would imagine 7z is probably very close to zip-aes. ace I have no idea, and it may be a 100% closed file format anyway, that we would not be able to get any information on.

kholia commented 10 years ago

On Thu, 3 Jul 2014, JimF wrote:

Has anyone looked at other compressors (ace, 7z, etc) ? I would imagine 7z is probably very close to zip-aes. ace I have no idea, and it may be a 100% closed file format anyway, that we would not be able to get any information on.

We have a cracker for 7z files but it is terribly broken. Take a guess at who wrote it ;(

Dhiru

magnumripper commented 10 years ago

IMO using file magic is a bad idea, very prone to false negatives. It's good when you have no alternatives but in this case we only have a "possible positive" once in 64K so a full check over a large file is not a problem at all.

jfoug commented 10 years ago

But the problem with no magic, is someone has a .zip file, encrypted that has only a 200mb pdf file zipped and compressed. Even on his own machine, that is a huge memory hit. Now, if he wants help, then everyone has to have the exact same named .zip file, located in the exact same place, so that jtr can find it to extract the file data. OR we build a 200mb input hash.

But if we use magic, we put out 100 bytes or so of hash, decrypt (do not worry about computing the authentication hmac-sha1 in this case), inflate that buffer of data, and look for something that has a 2 byte validator match, that properly inflates the decrypted data, and that the inflated part starts with %PDF

Yes, we can get false negatives. Someone can call a .iso file a .pdf, encrypt, and have that file be the point of attack. In that case, nothing we do (if trying to use 'magic') will work, since our 'known' plaintext is not really known.

I am currently writing for full buffer authentication. magic may get added. For authentication, there will be 2 'types'. First type is buffer inline (i.e. the deflated buffer and encrypted blob). The 2nd type will be some form of how to get it from the file (name, offset, crc, size, etc), and work similar to the pkzip_fmt code.

The nice thing about this format, is we now have a binary (for real). The 10 byte authenticator code, is our binary. So the format can work more like a normal format. It will just have a fast(er) check, and then a more through check, which will end up with us producing a proper 'binary' upon success.

One hard thing to deal with, is that file blob. It 'needs' to be in the salt, but that is HARD to do, as variable as the size can be, while still getting salt dupe removal. The only way around this is to use some form of pointer to the data, BUT then we lose our ability to be identical WHEN we are a duplicate. dyna has this type problem. Dyna worked around it by doing ALL of the dupe removal inside dyna, and returning the pointer to the same (internal to dyna) held buffer when dyna detects a dupe. I think what I may do here, is to load the buffer temp, compute md5, then have a list of buffers (multiple), so that I can walk the list, looking for the proper md5 value, to know I have this buffer already stored. If I find the buf stored, I do not allocate again. Then in the salt structure, I put a fixed 64 bits of that MD5 hash. That will be the lookup key, for finding the data when we hit a 'good' validator. Whether we search in set_salt or in crypt_all when we have a validator match will still to be seen. It 'may' have to be done in set_salt no matter what for OMP, since it probably should be some local static (file global) pointer and should only be searched for one time.

I have fun coding to do ;)

magnumripper commented 10 years ago

FWIW, the RAR format's get_salt() allocates a memory buffer for the "salt" (file data) but only returns a pointer to it, so SALT_SIZE is just sizeof(pointer). We never get dupe salt removal and that memory is never freed.

jfoug commented 10 years ago

Ok, I have a POC 'working'. The single core (32 bit) SSE build only dropped 60/s (2800 to 2740)

$ ../run/john -test=5 -format=zip
Benchmarking: ZIP, WinZip [PBKDF2-SHA1 8x SSE2]... DONE
Raw:    2805 c/s real, 2805 c/s virtual

$ ../run/john -test=5 -format=zip
Benchmarking: ZIP, WinZip [PBKDF2-SHA1 8x SSE2]... DONE
Raw:    2740 c/s real, 2749 c/s virtual

I put some printouts, and the new version during self test is being charged EVERY time to produce a HMAC-SHA1, so the reduction in time would be at the FAR end of worst case. So I would assume for normal runtime we would not see ANY loss of speed, and the new format will have NO false positives.

Now my current salt is this (and this will NOT work long term).

typedef struct my_salt_t {
        struct {
            uint16_t type     : 4;
            uint16_t mode : 4;
        } v;
        uint32_t comp_len;
        unsigned char passverify[2];
        unsigned char salt[SALT_LENGTH(3)];
        //uint64_t data_key; // MSB of md5(data blob).  We lookup using this.
        char datablob[1024];  // for now.
} my_salt;

The datablob is a 1024 byte array inline with the salt. That means the salt 'would' work and dupe remove through JtR, BUT we can not do any file larger than 1k. What I plan on doing is storing all of the file data blob inside zip format, loaded at get_salt. The code will load the data blob, compute the MD5 of the data keeping only the top 64 bits. Then those top 64 bits will be added to the salt that is being built. The zip format will the check to see if that same data blob has already been stored, and if so, it will not allocate any memory. If it has not been stored, it will allocate memory and save off the buffer, and then create a record within it's own data that associates that allocated memory buffer with the top 64 bits of the MD5. Then later, we can find this memory buffer using only the MD5 hash. Then, when a salt is set, the format will have that MD5 half value, and again be able to lookup the proper memory buffer. It may be that the MD5 is not needed to be put into the salt, but the actual pointer value can be put there, and all the MD5 lookup stuff only impacts load time. I have not worked out details, but I will see what I can put together.

I may make a 'common' method for doing this, for formats that have HUGELY variable sized salt needs. I know I made issue #692 recently and I think we will be seeing some formats that need some of this type functionality.

jfoug commented 10 years ago

FWIW, the RAR format's get_salt() allocates a memory buffer for the "salt" (file data) but only returns a > pointer to it, so SALT_SIZE is just sizeof(pointer). We never get dupe salt removal and that memory is > never freed.

I just tested rar format (against bug #692) and it appears to be getting 'native' JtR salt dupe removal. Yes, I see that it simply stores the pointer. NOTE, the pointer is pretty deep into the structure. Possibly dupe removal logic only checks so much of the salt, to see if it is a dupe? That in itself 'could' be a bug. Yes, it allows this format to work, but what about another format that actually encodes the data INTO the salt structure, and has something different deep in the salt data that makes it unique salts. Might JtR miss that?

jfoug commented 10 years ago

I added a dump_stuff ...... Continued this reply on #692 since we are getting off topic.

jfoug commented 10 years ago

For now, I am just going to allocate a buffer and load it. I will circle back later and make sure only 1 salt gets loaded, if multiples appear.

jfoug commented 10 years ago

Here is the change made in (with some extra code scrubbed out)

-               unsigned int pwd_ver;
+               unsigned char pwd_ver[(2+64)*MAX_KEYS_PER_CRYPT];
                pbkdf2_sha1((unsigned char *)saved_key[index],
-                      strlen(saved_key[index]), &saved_salt[3], SALT_LENGTH(mode),
+                      strlen(saved_key[index]), saved_salt->salt, SALT_LENGTH(saved_salt->v.mode),
-                      KEYING_ITERATIONS, (unsigned char*)&pwd_ver, 4, 2 * KEY_LENGTH(mode));
+                      KEYING_ITERATIONS, pwd_ver, 2+2*KEY_LENGTH(saved_salt->v.mode), 0);
-               cracked[index] = !memcmp(&pwd_ver, passverify, 2);
+               if (!memcmp(&(pwd_ver[KEY_LENGTH(saved_salt->v.mode)<<1]), saved_salt->passverify, 2))
+               {
+                       // yes, I know gladman's code but for now that is what I am using.  Later we will improve.
+                       hmac_sha1(&(pwd_ver[KEY_LENGTH(saved_salt->v.mode)]), KEY_LENGTH(saved_salt->v.mode),
+                       (const unsigned char*)saved_salt->datablob, saved_salt->comp_len,
+                       crypt_key[index], BINARY_SIZE);
+               }
+               else
+                       memset(crypt_key[index], 0, BINARY_SIZE);

Basically, it is identical code to the end of the pbkdf2. The new code however gets the decryption key AND the authenticator key (these are bytes out of the pbkdf2 directly before the 2 byte checksum). So there is no extra noticable work. I then simply call hmac-sha1 using those results, and store it in the crypt_key buffer for this candidate. If the 2-byte checksum fails (it will all but 1 out of 2^16), then I simply memset the crypt_key to null. This means we can not crack a zip where the hmac-sha1 is a complete 0 value. I doubt we will ever see that zip file in the wild, or even be able to fabricate it.

Now all I need to do is add ability to not store the data blob inline, update zip2john to do that, update zip_fmt to handle that, and start testing. Also, zip2john will need to double run a zip. First run will find the smallest aes encrypted item, then the next pass will output that one. Or I could just compute to a string, and replace the 'best' with the current built, until done with the file. It may be we want to add switches to allow user to specify a specific file, since a zip MAY have multiple passwords. They may crack one, but it does not crack the entire set of files, and they have to try the next. The problem with this, is you have users now that can make hashes for EVERY file in a .zip, thinking that is the way to go, when almost all of them are same password so now the user is just doing multiple same tests, all to fail, or all to work (once he finds the password). BUT we certainly need ot allow that ability.

jfoug commented 10 years ago

Question for the group following this. Should this new $zip2$ format be a totally new format, and we keep the existing $zip$ format? The ONLY reason I could see to keep the existing one was for things in the .pot file. The non-exact method of this is certainly not the way to go, BUT what to do with the existing hashes? They are pretty much worthless for proccessing with $zip2$ logic. Without the authenticator or the blob of data (or a way to get it from the file), we are back to pretty much only relying in the 2 byte quick checksum method.

magnumripper commented 10 years ago

I'd say ditch the $zip$ or even just reuse it without regarding the old format. First, it was useless (tens of false positives per minute and CPU core). Second, it never worked(!). Third, any existing cracks would still be reusable with --loopback mode, which will quickly run them through and make the correct crack out of them.

magnumripper commented 10 years ago

Second, it never worked

Actually I think due to the bug you recently fixed, ALL the false positives were indeed false - none of them were ever correct. LOL, this format reliably produced tens of false positives per minute but never a correct one.

jfoug commented 10 years ago

So is the plan to drop the older sig, keep the same format, adopting it into the new format?

magnumripper commented 10 years ago

You decide, I was just talking loud. There is no legacy to protect because the format has never ever worked. I think I would have re-used the tag but reject all old-style hashes in valid(). But that's just me. And if that rejection gets tricky, of course go with $zip2$ instead and just don't support $zip$ anymore at all.

jfoug commented 10 years ago

Here is my planned structure for the zip2 hash line:

//    filename:$zip2$*Ty*Mo*Ma*Sa*Va*Le*DF*Au*$/zip2$
//    Ty = type (0) and ignored.
//    Mo = mode (1 2 3 for 128/192/256 bit
//    Ma = magic (file magic).  This is reservered for now.  See pkzip_fmt_plug.c or zip2john.c for information.
//         For now, this must be a '0'
//    Sa = salt(hex).   8, 12 or 16 bytes of salt (depends on mode)
//    Va = Verification bytes(hex) (2 byte quick checker)
//    Le = real compr len (hex) length of compressed/encrypted data (field D)
//    DF = compressed data DF can be L*2 hex bytes, and if so, then it is the ENTIRE file blob written 'inline'.
//         However, if the data blob is too long, then a .zip ZIPDATA_FILE_PTR_RECORD structure will be the 'contents' of DF
//    Au = Authentication code (hex) a 10 byte hex value that is the hmac-sha1 of data over D. This is the binary() value

//  ZIPDATA_FILE_PTR_RECORD  (this can be the 'DF' of this above hash line.
//      *ZFILE*ZN*ZOH*ZOB*  (Note, the leading and trailing * are the * that 'wrap' the DF object.
//  FILE  This is the literal string ZFILE
//  ZN    This is the name of the .zip file.  NOTE the user will need to keep the .zip file in proper locations (same as
//        was seen when running zip2john. If the file is removed, this hash line will no longer be valid.
//  ZOH   Offset to the zip central header record for this blob.
//  ZOB   Offset to the start of the blob data

jfoug commented 10 years ago

What should the 'max' size file blob be for inline stored data? I have coded for 1024 bytes (2048 hex). Is that too much, too little or just right ;)

jfoug commented 10 years ago

Checking things in. Going with 1k max size of 'inline' unless someone wants it different.

magnumripper commented 10 years ago

Dhiru use to do LINE_BUFFER_SIZE and I was fine with that until he also bumped it. It's now 0x30000 and I think that is pushing it. But disregarding the actual value, you could indeed use LINE_BUFFER_SIZE.

magnumripper commented 10 years ago

BTW IMHO you should encode it in Base64 to save some space. Or rather, to fit more into whatever size you have.

jfoug commented 10 years ago

I am not fighting with base-64 on var sized data. base-16 is trivial to validate (and is done). Yes, smaller (would save from 2048 to 1365 for a 1k data blob).

Be my guest ;) It could be change later, without a format change (and work for both base16 and base-64, if you put some wart on the front end of the base 64 (like x64number)

Im in the process of checking in now.

jfoug commented 10 years ago

There is nothing in the format that says we can not bump up the size, from 1k to 100k. The buffer is a magnitude buffer. There is a size value, and they buffer. So bumping it up and down is trivial.

jfoug commented 10 years ago

Fixed/closed 528e6bc Format has been re-written. NOTE, older hashes WILL NOT work with new version. You will have to run zip2john again to create new hashes.

magnumripper commented 10 years ago

Awesome! I will test this.

frank-dittrich commented 10 years ago

On 07/07/2014 05:00 AM, JimF wrote:

I am not fighting with base-64 on var sized data. base-16 is trivial to validate (and is done). Yes, smaller (would save from 2048 to 1365 for a 1k data blob).

By now, we really should have (and use) some helper functions for base64 validation.

magnumripper commented 10 years ago

By now, we really should have (and use) some helper functions for base64 validation.

We have MIME Base64 in base64.[hc]. We might want to add some helpers if the current ones are too raw.

openwall / john

zip formats false positives #434