openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.3k stars 2.1k forks source link

Combine CRC32 code #1379

Closed jfoug closed 9 years ago

jfoug commented 9 years ago

We have at least 3 instances of CRC32 at this time. They all compute same crc values. We should unify this code.

magnumripper commented 9 years ago

An undocumented (at least vaguely documented) caveat with Solar's crc32.c is you need to call CRC32_Init() from a single-thread region the first time (eg. in format's init()). After that, it's thread-safe. Just putting it out here, we had an intermittent bug in RAR for months that I just could not nail (Solar eventually did) because of it.

magnumripper commented 9 years ago

@jfoug are you familiar with CRC-32? I tried implementing SSE4.2 crc32 intrinsics in our crc32 format, and it was totally trivial but unfortunately it turned out it was some other flavor of CRC-32 (namely CRC-32C as in Castagnoli) - but it was damn fast when ran with --skip-self-test! I still wonder if it can be used anyway, using some other initialization or workaround of some sort. Maybe it's totally different and can't be used.

We could of course implement a crc32c format too. Not that I see any use for it, but it would be cool for benchmarking. According to wikipedia it's used in eg. iSCSI, SCTP, G.hn payload, Btrfs, ext4 and Ceph.

magnumripper commented 9 years ago

The pkzip version seems to be 5% faster than Solar's. Not sure why though, they do exactly the same thing as far as I can tell (I inlined Solar's version in crc32 format for comparison).

jfoug commented 9 years ago

defines are almost always faster, due to allowing the compiler to better optimize things. Even an inlined function has certain rules that must be handled, i.e. all ops must be done within certain sequence points within loops, etc, even if the compiler can figure out how to delay things for better optimization. The #defined types are simply inlined code. There the compiler may be able to better intermix instructions. One other thing is that I use an array, while solars code uses a pointer to an allocated array. There should be a deference there, while on mine, just using a static array, it is just an indexed offset. I bet if you put the static array into solar's code, it would be same speed (or close).

magnumripper commented 9 years ago

I added a CRC-32C format just for fun. 99% based on the normal crc32 format. It reads a byte at a time, using _mm_crc32_u8(). It would probably be a lot faster reading 32 bits at a time with _mm_crc32_u32() if it wasn't for the fact a strlen() would totally ruin performance :)

magnumripper commented 9 years ago

Come to think of it, I should have added the new format in the same source file.

magnumripper commented 9 years ago

Here's an idea. We could not only use a single source file, but actually a single format. In set_salt() we could set the function pointer to crypt_all(). For $crc32$ it would be the normal function, and for $crc32c$ it would be the CRC-32C function. No overhead at all!

jfoug commented 9 years ago

It would probably be a lot faster reading 32 bits at a time with _mm_crc32_u32() if it wasn't for the fact a strlen() would totally ruin performance :)

No strlen() would be needed at all. Just compute length in set_key() after the key is copied (of course we would need an array of ints (or unsigned chars to keep working set smaller).

jfoug commented 9 years ago

I tried using _mm_crc32_u32() and it did not make that much difference:

$ ../run/john -test=3 -form=crc32c
Will run 8 OpenMP threads
Benchmarking: crc32c [CRC-32C SSE4.2]... (8xOMP) DONE
Many salts:     353160K c/s real, 101648K c/s virtual
Only one salt:  78945K c/s real, 53255K c/s virtual
$ ../run/john -test=3 -form=crc32c
Will run 8 OpenMP threads
Benchmarking: crc32c [CRC-32C SSE4.2]... (8xOMP) DONE
Many salts:     364659K c/s real, 108062K c/s virtual
Only one salt:  81137K c/s real, 53081K c/s virtual

Here is the code changes (not including setup of the saved_len array)

#if __SSE4_2__
        unsigned len = saved_len[i];
        unsigned *p4 = (unsigned*)p;
        while (len > 4) {
            crc = _mm_crc32_u32(crc, *p4++);
            len -= 4;
        }
        p = (unsigned char*)p4;
        while (*p)
            crc = _mm_crc32_u8(crc, *p++);
#else
magnumripper commented 9 years ago

LOL, that is damn near EXACTLY the code I tried. And I did use a ptr diff to get length in set_key. But even that has too much overhead in OMP (we're really hitting bottlenecks at these speeds). Just using u8 was a tad faster on this machine.

jfoug commented 9 years ago

Ok, I have fully integrated all crc (crc32 and crc-32C) into crc32.[ch], and changed all usages to properly use only this data (pkzip, dynamic, truecrypt, etc). However, this adds code to a pair of core files. I am waiting on @magnumripper to list how he wishes to proceed here.

This is a diff of the changes:

diff --git a/src/crc32.c b/src/crc32.c
index e262e67..bf777ed 100644
--- a/src/crc32.c
+++ b/src/crc32.c
@@ -23,20 +23,22 @@
 #include "crc32.h"
 #include "memdbg.h"

-#define POLY 0xEDB88320
-#define ALL1 0xFFFFFFFF
+#define POLY  0xEDB88320
+#define POLYC 0x82F63B78 // CRC-32C
+#define ALL1  0xFFFFFFFF

-static CRC32_t table[256];
+CRC32_t JTR_CRC32_table[256];
+CRC32_t JTR_CRC32_tableC[256];
 static int bInit=0;
-void CRC32_Init(CRC32_t *value)
+
+void CRC32_Init_tab()
 {
    unsigned int index, bit;
    CRC32_t entry;

-   *value = ALL1;
-
    if (bInit) return;
    bInit = 1;
+
    for (index = 0; index < 0x100; index++) {
        entry = index;

@@ -47,23 +49,35 @@ void CRC32_Init(CRC32_t *value)
        } else
            entry >>= 1;

-       table[index] = entry;
+       JTR_CRC32_table[index] = entry;
+   }
+   for (index = 0; index < 0x100; index++) {
+       entry = index;
+
+       for (bit = 0; bit < 8; bit++)
+       if (entry & 1) {
+           entry >>= 1;
+           entry ^= POLYC;
+       } else
+           entry >>= 1;
+
+       JTR_CRC32_tableC[index] = entry;
    }
 }

-void CRC32_Update(CRC32_t *value, void *data, unsigned int size)
+void CRC32_Init(CRC32_t *value)
 {
-   unsigned char *ptr;
-   unsigned int count;
-   CRC32_t result;
+   *value = ALL1;
+}

-   result = *value;
-   ptr = data;
-   count = size;
+void CRC32_Update(CRC32_t *value, void *data, unsigned int count)
+{
+   unsigned char *ptr = (unsigned char*)data;
+   CRC32_t result = *value;

    if (count)
    do {
-       result = (result >> 8) ^ table[(result ^ *ptr++) & 0xFF];
+       result = JTR_CRC32_table[(result ^ *ptr++) & 0xFF] ^ (result >> 8);
    } while (--count);

    *value = result;
@@ -77,3 +91,16 @@ void CRC32_Final(unsigned char *out, CRC32_t value)
    out[2] = value >> 16;
    out[3] = value >> 24;
 }
+
+void CRC32_UpdateC(CRC32_t *value, void *data, unsigned int count)
+{
+   unsigned char *ptr = (unsigned char*)data;
+   CRC32_t result = *value;
+
+   if (count)
+   do {
+       result = JTR_CRC32_tableC[(result ^ *ptr++) & 0xFF] ^ (result >> 8);
+   } while (--count);
+
+   *value = result;
+}
diff --git a/src/crc32.h b/src/crc32.h
index 9dd9751..e8b6d03 100644
--- a/src/crc32.h
+++ b/src/crc32.h
@@ -39,4 +39,30 @@ extern void CRC32_Update(CRC32_t *value, void *data, unsigned int size);
  */
 extern void CRC32_Final(unsigned char *out, CRC32_t value);

+/*
+ * initialze the table function.  (Jumbo function)
+ */
+
+void CRC32_Init_tab();
+
+/*
+ * This is the data, so our macro can access it also. (jumbo only)
+ */
+extern CRC32_t JTR_CRC32_table[256];
+extern CRC32_t JTR_CRC32_tableC[256];
+
+/*
+ * This is the data, so our macro can access it also. (jumbo only)
+ */
+#define jtr_crc32(crc,byte) (JTR_CRC32_table[(unsigned char)((crc)^(byte))] ^ ((crc) >> 8))
+
+/*
+ * Function and macro for CRC-32C polynomial. (jumbo only)
+ * If using the function, then use the CRC32_Init() and CRC32_Update() function.
+ * just make sure to use either the CRC32_UpdateC() function or the jtr_crc32c() macro.
+ */
+extern void CRC32_UpdateC(CRC32_t *value, void *data, unsigned int size);
+#define jtr_crc32c(crc,byte) (JTR_CRC32_tableC[(unsigned char)((crc)^(byte))] ^ ((crc) >> 8))
+
+
 #endif
diff --git a/src/crc32_fmt_plug.c b/src/crc32_fmt_plug.c
index d14bf5c..426153a 100644
--- a/src/crc32_fmt_plug.c
+++ b/src/crc32_fmt_plug.c
@@ -37,7 +37,7 @@ john_register_one(&fmt_crc32);

 #include "common.h"
 #include "formats.h"
-#include "pkzip.h"  // includes the 'inline' crc table.
+#include "crc32.h"
 #include "loader.h"

 #ifdef _OPENMP
@@ -88,36 +88,8 @@ static struct fmt_tests tests[] = {

 static struct fmt_main *pFmt;
 static char (*saved_key)[PLAINTEXT_LENGTH + 1];
-static ARCH_WORD_32 (*crcs);
-static ARCH_WORD_32 crcsalt;
-
-/* Copied from Solar's crc32.[hc] that does standard CRC-32 */
-typedef ARCH_WORD_32 CRC32C_t;
-
-#define POLY 0x82F63B78 // CRC-32C
-//#define POLY 0xEDB88320 // normal CRC-32
-#define ALL1 0xFFFFFFFF
-
-static CRC32C_t table[256];
-
-static void CRC32C_tab_Init()
-{
-   unsigned int index, bit;
-   CRC32C_t entry;
-
-   for (index = 0; index < 0x100; index++) {
-       entry = index;
-
-       for (bit = 0; bit < 8; bit++)
-       if (entry & 1) {
-           entry >>= 1;
-           entry ^= POLY;
-       } else
-           entry >>= 1;
-
-       table[index] = entry;
-   }
-}
+static CRC32_t (*crcs);
+static CRC32_t crcsalt;

 static void init(struct fmt_main *self)
 {
@@ -135,7 +107,7 @@ static void init(struct fmt_main *self)
    crcs      = mem_calloc(self->params.max_keys_per_crypt,
                           sizeof(*crcs));

-   CRC32C_tab_Init();
+   CRC32_Init_tab();
    pFmt = self;
 }

@@ -245,10 +217,10 @@ static int crypt_all(int *pcount, struct db_salt *salt)
 #pragma omp parallel for private(i)
 #endif
    for (i = 0; i < count; ++i) {
-       ARCH_WORD_32 crc = crcsalt;
+       CRC32_t crc = crcsalt;
        unsigned char *p = (unsigned char*)saved_key[i];
        while (*p)
-           crc = pkzip_crc32(crc, *p++);
+           crc = jtr_crc32(crc, *p++);
        //crcs[i] = ~crc;
        crcs[i] = crc;
    }
@@ -263,14 +235,14 @@ static int crypt_allc(int *pcount, struct db_salt *salt)
 #pragma omp parallel for private(i)
 #endif
    for (i = 0; i < count; ++i) {
-       CRC32C_t crc = (CRC32C_t)crcsalt;
+       CRC32_t crc = crcsalt;
        unsigned char *p = (unsigned char*)saved_key[i];
 #if __SSE4_2__
        while (*p)
            crc = _mm_crc32_u8(crc, *p++);
 #else
        while (*p)
-           crc = table[(crc ^ *p++) & 0xFF] ^ (crc >> 8);
+           crc = jtr_crc32c(crc, *p++);
 #endif
        crcs[i] = crc;
        //printf("In: '%s' Out: %08x\n", saved_key[i], ~crc);
diff --git a/src/crc32_plug.c b/src/crc32_plug.c
deleted file mode 100644
index 38c1238..0000000
--- a/src/crc32_plug.c
+++ /dev/null
@@ -1,108 +0,0 @@
-/*-
- *  COPYRIGHT (C) 1986 Gary S. Brown.  You may use this program, or
- *  code or tables extracted from it, as desired without restriction.
- *
- *  First, the polynomial itself and its table of feedback terms.  The
- *  polynomial is
- *  X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0
- *
- *  Note that we take it "backwards" and put the highest-order term in
- *  the lowest-order bit.  The X^32 term is "implied"; the LSB is the
- *  X^31 term, etc.  The X^0 term (usually shown as "+1") results in
- *  the MSB being 1
- *
- *  Note that the usual hardware shift register implementation, which
- *  is what we're using (we're merely optimizing it by doing eight-bit
- *  chunks at a time) shifts bits into the lowest-order term.  In our
- *  implementation, that means shifting towards the right.  Why do we
- *  do it this way?  Because the calculated CRC must be transmitted in
- *  order from highest-order term to lowest-order term.  UARTs transmit
- *  characters in order from LSB to MSB.  By storing the CRC this way
- *  we hand it to the UART in the order low-byte to high-byte; the UART
- *  sends each low-bit to hight-bit; and the result is transmission bit
- *  by bit from highest- to lowest-order term without requiring any bit
- *  shuffling on our part.  Reception works similarly
- *
- *  The feedback terms table consists of 256, 32-bit entries.  Notes
- *
- *      The table can be generated at runtime if desired; code to do so
- *      is shown later.  It might not be obvious, but the feedback
- *      terms simply represent the results of eight shift/xor opera
- *      tions for all combinations of data and CRC register values
- *
- *      The values must be right-shifted by eight bits by the "updcrc
- *      logic; the shift must be unsigned (bring in zeroes).  On some
- *      hardware you could probably optimize the shift in assembler by
- *      using byte-swap instructions
- *      polynomial $edb88320
- */
-
-#include <stdlib.h>
-#include <stdint.h>
-
-uint32_t crc32_tab[] = {
-   0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
-   0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
-   0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2,
-   0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
-   0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
-   0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
-   0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c,
-   0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
-   0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423,
-   0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
-   0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106,
-   0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
-   0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d,
-   0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
-   0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
-   0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
-   0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7,
-   0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
-   0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa,
-   0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
-   0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81,
-   0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
-   0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84,
-   0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
-   0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
-   0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
-   0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e,
-   0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
-   0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55,
-   0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
-   0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28,
-   0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
-   0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f,
-   0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
-   0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242,
-   0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
-   0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69,
-   0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
-   0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc,
-   0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
-   0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693,
-   0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
-   0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
-};
-
-uint32_t
-crc32(const void *buf, size_t size)
-{
-   const uint8_t *p;
-   uint32_t crc;
-
-   p = buf;
-   crc = ~0U;
-
-   while (size--)
-       crc = crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8);
-
-   return crc ^ ~0U;
-}
-
-uint32_t
-crc32_intermediate(uint32_t crc, uint8_t d)
-{
-   return crc32_tab[(crc ^ d) & 0xFF] ^ (crc >> 8);
-}
diff --git a/src/dynamic_fmt.c b/src/dynamic_fmt.c
index c772bec..4d6cb7b 100644
--- a/src/dynamic_fmt.c
+++ b/src/dynamic_fmt.c
@@ -87,7 +87,7 @@ static DYNAMIC_primitive_funcp _Funcs_1[] =
 #include "memory.h"
 #include "unicode.h"
 #include "johnswap.h"
-#include "pkzip.h"
+#include "crc32.h"
 #include "aligned.h"
 #include "fake_salts.h"
 #include "base64_convert.h"
@@ -1873,7 +1873,7 @@ static unsigned char *AddSaltHash(unsigned char *salt, unsigned int len, unsigne
    return pRet;
 }

-static unsigned char *FindSaltHash(unsigned char *salt, unsigned int len, u32 crc)
+static unsigned char *FindSaltHash(unsigned char *salt, unsigned int len, CRC32_t crc)
 {
    unsigned int idx = crc & DYNA_SALT_HASH_MOD;
    dyna_salt_list_entry *p;
@@ -1896,12 +1896,12 @@ static unsigned char *FindSaltHash(unsigned char *salt, unsigned int len, u32 cr

 static unsigned char *HashSalt(unsigned char *salt, unsigned int len)
 {
-   u32 crc = 0xffffffff, i;
+   CRC32_t crc = 0xffffffff, i;
    unsigned char *ret_hash;

    // compute the hash.
    for (i = 0; i < len; ++i)
-       crc = pkzip_crc32(crc,salt[i]);
+       crc = jtr_crc32(crc,salt[i]);
    crc = ~crc;

    ret_hash = FindSaltHash(salt, len, crc);
diff --git a/src/john.c b/src/john.c
index b504575..c49336d 100644
--- a/src/john.c
+++ b/src/john.c
@@ -94,6 +94,7 @@ static int john_omp_threads_new;
 #include "dynamic.h"
 #include "fake_salts.h"
 #include "listconf.h"
+#include "crc32.h"
 #if HAVE_MPI
 #include "john-mpi.h"
 #endif
@@ -1211,7 +1212,6 @@ static void john_init(char *name, int argc, char **argv)
        argv[1] = "--test=0";

    CPU_detect_or_fallback(argv, make_check);
-
 #ifdef _OPENMP
    john_omp_init();
 #endif
@@ -1617,6 +1617,9 @@ int main(int argc, char **argv)
    }
 #endif

+   /* put the crc table init here, so that tables are fully setup for any ancillary program */
+   CRC32_Init_tab();
+
    if (!strcmp(name, "unshadow")) {
        CPU_detect_or_fallback(argv, 0);
        return unshadow(argc, argv);
diff --git a/src/pkzip.c b/src/pkzip.c
index 75e1799..04f6505 100644
--- a/src/pkzip.c
+++ b/src/pkzip.c
@@ -8,42 +8,6 @@
 #include "pkzip.h"
 #include "memdbg.h"

-const u32 pkz_crc_32_tab[] =
-{
-   0x00000000UL, 0x77073096UL, 0xee0e612cUL, 0x990951baUL, 0x076dc419UL, 0x706af48fUL, 0xe963a535UL, 0x9e6495a3UL,
-   0x0edb8832UL, 0x79dcb8a4UL, 0xe0d5e91eUL, 0x97d2d988UL, 0x09b64c2bUL, 0x7eb17cbdUL, 0xe7b82d07UL, 0x90bf1d91UL,
-   0x1db71064UL, 0x6ab020f2UL, 0xf3b97148UL, 0x84be41deUL, 0x1adad47dUL, 0x6ddde4ebUL, 0xf4d4b551UL, 0x83d385c7UL,
-   0x136c9856UL, 0x646ba8c0UL, 0xfd62f97aUL, 0x8a65c9ecUL, 0x14015c4fUL, 0x63066cd9UL, 0xfa0f3d63UL, 0x8d080df5UL,
-   0x3b6e20c8UL, 0x4c69105eUL, 0xd56041e4UL, 0xa2677172UL, 0x3c03e4d1UL, 0x4b04d447UL, 0xd20d85fdUL, 0xa50ab56bUL,
-   0x35b5a8faUL, 0x42b2986cUL, 0xdbbbc9d6UL, 0xacbcf940UL, 0x32d86ce3UL, 0x45df5c75UL, 0xdcd60dcfUL, 0xabd13d59UL,
-   0x26d930acUL, 0x51de003aUL, 0xc8d75180UL, 0xbfd06116UL, 0x21b4f4b5UL, 0x56b3c423UL, 0xcfba9599UL, 0xb8bda50fUL,
-   0x2802b89eUL, 0x5f058808UL, 0xc60cd9b2UL, 0xb10be924UL, 0x2f6f7c87UL, 0x58684c11UL, 0xc1611dabUL, 0xb6662d3dUL,
-   0x76dc4190UL, 0x01db7106UL, 0x98d220bcUL, 0xefd5102aUL, 0x71b18589UL, 0x06b6b51fUL, 0x9fbfe4a5UL, 0xe8b8d433UL,
-   0x7807c9a2UL, 0x0f00f934UL, 0x9609a88eUL, 0xe10e9818UL, 0x7f6a0dbbUL, 0x086d3d2dUL, 0x91646c97UL, 0xe6635c01UL,
-   0x6b6b51f4UL, 0x1c6c6162UL, 0x856530d8UL, 0xf262004eUL, 0x6c0695edUL, 0x1b01a57bUL, 0x8208f4c1UL, 0xf50fc457UL,
-   0x65b0d9c6UL, 0x12b7e950UL, 0x8bbeb8eaUL, 0xfcb9887cUL, 0x62dd1ddfUL, 0x15da2d49UL, 0x8cd37cf3UL, 0xfbd44c65UL,
-   0x4db26158UL, 0x3ab551ceUL, 0xa3bc0074UL, 0xd4bb30e2UL, 0x4adfa541UL, 0x3dd895d7UL, 0xa4d1c46dUL, 0xd3d6f4fbUL,
-   0x4369e96aUL, 0x346ed9fcUL, 0xad678846UL, 0xda60b8d0UL, 0x44042d73UL, 0x33031de5UL, 0xaa0a4c5fUL, 0xdd0d7cc9UL,
-   0x5005713cUL, 0x270241aaUL, 0xbe0b1010UL, 0xc90c2086UL, 0x5768b525UL, 0x206f85b3UL, 0xb966d409UL, 0xce61e49fUL,
-   0x5edef90eUL, 0x29d9c998UL, 0xb0d09822UL, 0xc7d7a8b4UL, 0x59b33d17UL, 0x2eb40d81UL, 0xb7bd5c3bUL, 0xc0ba6cadUL,
-   0xedb88320UL, 0x9abfb3b6UL, 0x03b6e20cUL, 0x74b1d29aUL, 0xead54739UL, 0x9dd277afUL, 0x04db2615UL, 0x73dc1683UL,
-   0xe3630b12UL, 0x94643b84UL, 0x0d6d6a3eUL, 0x7a6a5aa8UL, 0xe40ecf0bUL, 0x9309ff9dUL, 0x0a00ae27UL, 0x7d079eb1UL,
-   0xf00f9344UL, 0x8708a3d2UL, 0x1e01f268UL, 0x6906c2feUL, 0xf762575dUL, 0x806567cbUL, 0x196c3671UL, 0x6e6b06e7UL,
-   0xfed41b76UL, 0x89d32be0UL, 0x10da7a5aUL, 0x67dd4accUL, 0xf9b9df6fUL, 0x8ebeeff9UL, 0x17b7be43UL, 0x60b08ed5UL,
-   0xd6d6a3e8UL, 0xa1d1937eUL, 0x38d8c2c4UL, 0x4fdff252UL, 0xd1bb67f1UL, 0xa6bc5767UL, 0x3fb506ddUL, 0x48b2364bUL,
-   0xd80d2bdaUL, 0xaf0a1b4cUL, 0x36034af6UL, 0x41047a60UL, 0xdf60efc3UL, 0xa867df55UL, 0x316e8eefUL, 0x4669be79UL,
-   0xcb61b38cUL, 0xbc66831aUL, 0x256fd2a0UL, 0x5268e236UL, 0xcc0c7795UL, 0xbb0b4703UL, 0x220216b9UL, 0x5505262fUL,
-   0xc5ba3bbeUL, 0xb2bd0b28UL, 0x2bb45a92UL, 0x5cb36a04UL, 0xc2d7ffa7UL, 0xb5d0cf31UL, 0x2cd99e8bUL, 0x5bdeae1dUL,
-   0x9b64c2b0UL, 0xec63f226UL, 0x756aa39cUL, 0x026d930aUL, 0x9c0906a9UL, 0xeb0e363fUL, 0x72076785UL, 0x05005713UL,
-   0x95bf4a82UL, 0xe2b87a14UL, 0x7bb12baeUL, 0x0cb61b38UL, 0x92d28e9bUL, 0xe5d5be0dUL, 0x7cdcefb7UL, 0x0bdbdf21UL,
-   0x86d3d2d4UL, 0xf1d4e242UL, 0x68ddb3f8UL, 0x1fda836eUL, 0x81be16cdUL, 0xf6b9265bUL, 0x6fb077e1UL, 0x18b74777UL,
-   0x88085ae6UL, 0xff0f6a70UL, 0x66063bcaUL, 0x11010b5cUL, 0x8f659effUL, 0xf862ae69UL, 0x616bffd3UL, 0x166ccf45UL,
-   0xa00ae278UL, 0xd70dd2eeUL, 0x4e048354UL, 0x3903b3c2UL, 0xa7672661UL, 0xd06016f7UL, 0x4969474dUL, 0x3e6e77dbUL,
-   0xaed16a4aUL, 0xd9d65adcUL, 0x40df0b66UL, 0x37d83bf0UL, 0xa9bcae53UL, 0xdebb9ec5UL, 0x47b2cf7fUL, 0x30b5ffe9UL,
-   0xbdbdf21cUL, 0xcabac28aUL, 0x53b39330UL, 0x24b4a3a6UL, 0xbad03605UL, 0xcdd70693UL, 0x54de5729UL, 0x23d967bfUL,
-   0xb3667a2eUL, 0xc4614ab8UL, 0x5d681b02UL, 0x2a6f2b94UL, 0xb40bbe37UL, 0xc30c8ea1UL, 0x5a05df1bUL, 0x2d02ef8dUL
-};
-
 /* helper functions for reading binary data of known little endian */
 /* format from a file. Works whether BE or LE system.              */
 u32 fget32LE(FILE * fp)
diff --git a/src/pkzip.h b/src/pkzip.h
index c4094c6..0628c35 100644
--- a/src/pkzip.h
+++ b/src/pkzip.h
@@ -8,11 +8,7 @@ typedef unsigned char u8;
 typedef          char c8;
 typedef ARCH_WORD_32 u32;

-/* crc32 0xdebb20e3 table and supplementary functions.  */
-
-extern const u32 pkz_crc_32_tab[256];
-
-#define pkzip_crc32(crc,byte) (pkz_crc_32_tab[(u8)((crc)^(byte))] ^ ((crc) >> 8))
+#include "crc32.h"

 u32 fget32LE(FILE * fp);
 u16 fget16LE(FILE * fp);
diff --git a/src/pkzip_fmt_plug.c b/src/pkzip_fmt_plug.c
index 6d5b342..d316481 100644
--- a/src/pkzip_fmt_plug.c
+++ b/src/pkzip_fmt_plug.c
@@ -733,9 +733,9 @@ static int get_next_decrypted_block(u8 *in, int sizeof_n, FILE *fp, u32 *inp_use
    /* decrypt the data bytes (in place, in same buffer). Easy to do, only requires 1 temp character variable.  */
    for (k = 0; k < new_bytes; ++k) {
        C = PKZ_MULT(in[k],(*pkey2));
-       pkey0->u = pkzip_crc32 (pkey0->u, C);
+       pkey0->u = jtr_crc32 (pkey0->u, C);
        pkey1->u = (pkey1->u + pkey0->c[KB1]) * 134775813 + 1;
-       pkey2->u = pkzip_crc32 (pkey2->u, pkey1->c[KB2]);
+       pkey2->u = jtr_crc32 (pkey2->u, pkey1->c[KB2]);
        in[k] = C;
    }
    /* return the number of bytes we read from the file on this read */
@@ -792,9 +792,9 @@ static int cmp_exact_loadfile(int index)
    b = salt->H[salt->full_zip_idx].h;
    do {
        C = PKZ_MULT(*b++,key2);
-       key0.u = pkzip_crc32 (key0.u, C);
+       key0.u = jtr_crc32 (key0.u, C);
        key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-       key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+       key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
    }
    while(--k);

@@ -810,7 +810,7 @@ static int cmp_exact_loadfile(int index)
         avail_in = get_next_decrypted_block(in, CHUNK, fp, &inp_used, &key0, &key1, &key2);
        while (avail_in) {
            for (k = 0; k < avail_in; ++k)
-               crc = pkzip_crc32(crc,in[k]);
+               crc = jtr_crc32(crc,in[k]);
            avail_in = get_next_decrypted_block(in, CHUNK, fp, &inp_used, &key0, &key1, &key2);
        }
        fclose(fp);
@@ -856,7 +856,7 @@ static int cmp_exact_loadfile(int index)
             have = CHUNK - strm.avail_out;
            /* now update our crc value */
            for (k = 0; k < have; ++k)
-               crc = pkzip_crc32(crc,out[k]);
+               crc = jtr_crc32(crc,out[k]);
            decomp_len += have;
         } while (strm.avail_out == 0);

@@ -901,26 +901,26 @@ static int cmp_exact(char *source, int index)
        k=12;
        do {
            C = PKZ_MULT(*b++,key2);
-           key0.u = pkzip_crc32 (key0.u, C);
+           key0.u = jtr_crc32 (key0.u, C);
            key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-           key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+           key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
        }
        while(--k);
        B = decrBuf;
        k = salt->compLen-12;
        do {
            C = PKZ_MULT(*b++,key2);
-           key0.u = pkzip_crc32 (key0.u, C);
+           key0.u = jtr_crc32 (key0.u, C);
            *B++ = C;
            key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-           key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+           key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
        } while (--k);

        if (salt->H[salt->full_zip_idx].compType == 0) {
            // handle a stored blob (we do not have to decrypt it.
            crc = 0xFFFFFFFF;
            for (k = 0; k < salt->compLen-12; ++k)
-               crc = pkzip_crc32(crc,decrBuf[k]);
+               crc = jtr_crc32(crc,decrBuf[k]);
            MEM_FREE(decrBuf);
            return ~crc == salt->crc32;
        }
@@ -948,7 +948,7 @@ static int cmp_exact(char *source, int index)

        crc = 0xFFFFFFFF;
        for (k = 0; k < strm.total_out; ++k)
-           crc = pkzip_crc32(crc,decompBuf[k]);
+           crc = jtr_crc32(crc,decompBuf[k]);
        MEM_FREE(decompBuf);
        MEM_FREE(decrBuf);
        return ~crc == salt->crc32;
@@ -1362,9 +1362,9 @@ static int crypt_all(int *pcount, struct db_salt *_salt)
            /* load the 'pwkey' one time, put it into the K12 array */
            key0.u = 0x12345678UL; key1.u = 0x23456789UL; key2.u = 0x34567890UL;
            do {
-               key0.u = pkzip_crc32 (key0.u, *p++);
+               key0.u = jtr_crc32 (key0.u, *p++);
                key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-               key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+               key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
            } while (*p);
            K12[idx*3] = key0.u, K12[idx*3+1] = key1.u, K12[idx*3+2] = key2.u;
            goto SkipKeyLoadInit;
@@ -1386,9 +1386,9 @@ SkipKeyLoadInit:;
            do
            {
                C = PKZ_MULT(*b++,key2);
-               key0.u = pkzip_crc32 (key0.u, C);
+               key0.u = jtr_crc32 (key0.u, C);
                key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-               key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+               key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
            }
            while(--k);

@@ -1406,9 +1406,9 @@ SkipKeyLoadInit:;
 #endif

            // Now, update the key data (with that last byte.
-           key0.u = pkzip_crc32 (key0.u, C);
+           key0.u = jtr_crc32 (key0.u, C);
            key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-           key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+           key2.u = jtr_crc32 (key2.u, key1.c[KB2]);

            // Ok, we now have validated this checksum.  We need to 'do some' extra pkzip validation work.
            // What we do here, is to decrypt a little data (possibly only 1 byte), and perform a single
@@ -1436,9 +1436,9 @@ SkipKeyLoadInit:;
                    SigChecked = 1;
                    curDecryBuf[0] = C;
                    for (; e < len;) {
-                       key0.u = pkzip_crc32 (key0.u, curDecryBuf[e]);
+                       key0.u = jtr_crc32 (key0.u, curDecryBuf[e]);
                        key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-                       key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+                       key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
                        curDecryBuf[++e] = PKZ_MULT(*b++,key2);
                    }

@@ -1469,9 +1469,9 @@ SkipKeyLoadInit:;
                // correct data is u16_1 == (u16_2^0xFFFF)
                curDecryBuf[0] = C;
                for (e = 0; e <= 4; ) {
-                   key0.u = pkzip_crc32 (key0.u, curDecryBuf[e]);
+                   key0.u = jtr_crc32 (key0.u, curDecryBuf[e]);
                    key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-                   key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+                   key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
                    curDecryBuf[++e] = PKZ_MULT(*b++,key2);
                }
                v1 = curDecryBuf[1] | (((u16)curDecryBuf[2])<<8);
@@ -1486,9 +1486,9 @@ SkipKeyLoadInit:;
                        len = salt->H[cur_hash_idx].datlen-12;
                    SigChecked = 1;
                    for (; e < len;) {
-                       key0.u = pkzip_crc32 (key0.u, curDecryBuf[e]);
+                       key0.u = jtr_crc32 (key0.u, curDecryBuf[e]);
                        key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-                       key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+                       key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
                        curDecryBuf[++e] = PKZ_MULT(*b++,key2);
                    }

@@ -1515,9 +1515,9 @@ SkipKeyLoadInit:;
 #endif
                    // we need 4 bytes, + 2, + 4 at most.
                    for (; e < 10;) {
-                       key0.u = pkzip_crc32 (key0.u, curDecryBuf[e]);
+                       key0.u = jtr_crc32 (key0.u, curDecryBuf[e]);
                        key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-                       key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+                       key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
                        curDecryBuf[++e] = PKZ_MULT(*b++,key2);
                    }
                    if (!check_inflate_CODE2(curDecryBuf))
@@ -1536,9 +1536,9 @@ SkipKeyLoadInit:;
                    if (salt->H[cur_hash_idx].datlen-12 < til)
                        til = salt->H[cur_hash_idx].datlen-12;
                    for (; e < til;) {
-                       key0.u = pkzip_crc32 (key0.u, curDecryBuf[e]);
+                       key0.u = jtr_crc32 (key0.u, curDecryBuf[e]);
                        key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-                       key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+                       key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
                        curDecryBuf[++e] = PKZ_MULT(*b++,key2);
                    }
                    if (!check_inflate_CODE1(curDecryBuf, til))
@@ -1555,9 +1555,9 @@ SkipKeyLoadInit:;
                if (salt->H[cur_hash_idx].datlen-12 < til)
                    til = salt->H[cur_hash_idx].datlen-12;
                for (; e < til;) {
-                   key0.u = pkzip_crc32 (key0.u, curDecryBuf[e]);
+                   key0.u = jtr_crc32 (key0.u, curDecryBuf[e]);
                    key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-                   key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+                   key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
                    curDecryBuf[++e] = PKZ_MULT(*b++,key2);
                }
                strm.zalloc = Z_NULL; strm.zfree = Z_NULL; strm.opaque = Z_NULL; strm.next_in = Z_NULL;
@@ -1602,9 +1602,9 @@ SkipKeyLoadInit:;
                u8 inflateBufTmp[1024];
                if (salt->compLen > 240 && salt->H[cur_hash_idx].datlen >= 200) {
                    for (;e < 200;) {
-                       key0.u = pkzip_crc32 (key0.u, curDecryBuf[e]);
+                       key0.u = jtr_crc32 (key0.u, curDecryBuf[e]);
                        key1.u = (key1.u + key0.c[KB1]) * 134775813 + 1;
-                       key2.u = pkzip_crc32 (key2.u, key1.c[KB2]);
+                       key2.u = jtr_crc32 (key2.u, key1.c[KB2]);
                        curDecryBuf[++e] = PKZ_MULT(*b++,key2);
                    }
                    strm.zalloc = Z_NULL; strm.zfree = Z_NULL; strm.opaque = Z_NULL; strm.next_in = Z_NULL;
diff --git a/src/pst_fmt_plug.c b/src/pst_fmt_plug.c
index 66bf5d1..cd37cdb 100644
--- a/src/pst_fmt_plug.c
+++ b/src/pst_fmt_plug.c
@@ -21,7 +21,7 @@ john_register_one(&fmt_pst);
 #include "misc.h"
 #include "common.h"
 #include "formats.h"
-#include "pkzip.h"  // includes the 'inline' crc table.
+#include "crc32.h"
 #ifdef _OPENMP
 #include <omp.h>
 #ifdef __MIC__
@@ -129,10 +129,10 @@ static int crypt_all(int *pcount, struct db_salt *salt)
 #pragma omp parallel for private(i)
 #endif
    for (i = 0; i < count; ++i) {
-       ARCH_WORD_32 crc = 0;
+       CRC32_t crc = 0;
        unsigned char *p = (unsigned char*)saved_key[i];
        while (*p)
-           crc = pkzip_crc32(crc, *p++);
+           crc = jtr_crc32(crc, *p++);
        crypt_out[i] = crc;
    }
    return count;
diff --git a/src/truecrypt_fmt_plug.c b/src/truecrypt_fmt_plug.c
index 80177d7..378d719 100644
--- a/src/truecrypt_fmt_plug.c
+++ b/src/truecrypt_fmt_plug.c
@@ -371,10 +371,6 @@ static void AES_256_XTS_first_sector(const unsigned char *double_key,
    }
 }

-// borrowed from https://github.com/bwalex/tc-play
-uint32_t crc32(const void *buf, size_t size);
-uint32_t crc32_intermediate(uint32_t crc, uint8_t d);
-
 int apply_keyfiles(unsigned char *pass, size_t pass_memsz, int nkeyfiles)
 {
    int pl, k;
@@ -402,7 +398,7 @@ int apply_keyfiles(unsigned char *pass, size_t pass_memsz, int nkeyfiles)
        crc = ~0U;

        for (i = 0; i < kdata_sz; i++) {
-           crc = crc32_intermediate(crc, kdata[i]);
+           crc = jtr_crc32(crc, kdata[i]);
            kpool[kpool_idx++] += (unsigned char)(crc >> 24);
            kpool[kpool_idx++] += (unsigned char)(crc >> 16);
            kpool[kpool_idx++] += (unsigned char)(crc >> 8);
jfoug commented 9 years ago
- if (bInit) return;

NOTE, that line had to be retained in the table_init() function.

jfoug commented 9 years ago

We should just call CRC32_Init_tab() within john_init code. It runs in milliseconds, so no one will ever know. Then the tables will simply be 'ready'.

I have updated the patch, with changes (the bInit) return, and adding init call to John, and removing the init call from the crc_init function.

magnumripper commented 9 years ago

So perhaps __SSE4_2__ logic should be added to crc32.[hc] too, so a format don't need to care about it.