MSCHAP-v2 is broken (slow)

jfoug commented 10 years ago

The naive version is MUCH faster. I think something is horribly wrong with the salt running of the supposed fast version.

Here are some timings of REAL WORK (not -test speeds). Yes, I know that the chapv2 has a startup hit. But it was only about 20-30s. I can easily ignore that. The actual runtime is VERY slow.

Here is a run against the test suite file. The file was a 'new' one I have not yet released, but the any of the other 1500 hash test files for this format should be fine.

Here are timings against a 14 million candidate dictionary made from rocku

mschapv2  (Note, takes about 20-30s or so loading time)
157g 0:00:03:08 3.97% (ETA: 16:41:41) 0.8337g/s 3482p/s 4925Kc/s 4925KC/s googler..girlies2

mschapv2-naive
1099g 0:00:01:27 DONE (2014-07-01 15:27) 12.53g/s 163583p/s 120379Kc/s 120379KC/s ...20458135.....*7¡Vamos!
Use the "--show" option to display all of the cracked passwords reliably
Session completed

This was not an overlapped run. I ran fresh both times. The naive version is about 50x faster. Something is ghastly wrong.

jfoug commented 10 years ago

I tried testing with far fewer hashes (against same dictionary). I let things run to completion this time:

$ ../john-1.7.9/bleeding/run/john -ses=./tst -pot=./tst.pot --wordlist=/c/projects/dictionaries/rockyou.txt xxx.in -form=mschapv2-naive -enc=utf8
Loaded 9 password hashes with 9 different salts (mschapv2-naive, MSCHAPv2 C/R [MD4 DES DES 256/256 AVX naive])
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
NIÑALINDA        (fR14characters)
SOÑADORA         (u3)
schütze          (teN__chars)
pw!nc£$$         (3)
4g 0:00:00:03 DONE (2014-07-01 15:41) 1.253g/s 4495Kp/s 26131Kc/s 26131KC/s ...20458135.....*7¡Vamos!
Use the "--show" option to display all of the cracked passwords reliably
Session completed

$ rm tst.pot

$ ../john-1.7.9/bleeding/run/john -ses=./tst -pot=./tst.pot --wordlist=/c/projects/dictionaries/rockyou.txt xxx.in -form=mschapv2 -enc=utf8
Loaded 9 password hashes with 9 different salts (MSCHAPv2, C/R [MD4 DES (ESS MD5) 128/128 AVX 12x])
Warning: no OpenMP support for this hash type, consider --fork=8
Press 'q' or Ctrl-C to abort, almost any other key for status
NIÑALINDA        (fR14characters)
SOÑADORA         (u3)
schütze          (teN__chars)
pw!nc£$$         (3)
4g 0:00:04:00 DONE (2014-07-01 15:46) 0.01665g/s 59716p/s 346738c/s 346738C/s !!high2way3..*7¡Vamos!
Use the "--show" option to display all of the cracked passwords reliably
Session completed

This again shows 70x or so slow down.

magnumripper commented 10 years ago

I can't reproduce.

$ rm -f ../run/john.pot &&  ../run/john -form:mschapv2-naive ../test/mschapv2_tst.in -wo -ru:single
Loaded 1500 password hashes with 1500 different salts (mschapv2-naive, MSCHAPv2 C/R [MD4 DES DES 128/128 AVX-16 naive])
Press 'q' or Ctrl-C to abort, almost any other key for status
hello            (u-12-mschapv2)
...
qwert12345       (u-27-mschapv2)
26g 0:00:00:53 DONE (2014-07-02 01:20) 0.4852g/s 55125p/s 82293Kc/s 82293KC/s nite1900..sss1900
Use the "--show" option to display all of the cracked passwords reliably
Session completed

$ rm -f ../run/john.pot &&  ../run/john -form:mschapv2 ../test/mschapv2_tst.in -wo -ru:single
MSCHAPv2: Note: slow loading. For short runs, try --format=MSCHAPv2-naive
instead. That version loads faster but runs slower.
Loaded 1500 password hashes with 1500 different salts (MSCHAPv2, C/R [MD4 DES (ESS MD5) 128/128 AVX 12x])
Press 'q' or Ctrl-C to abort, almost any other key for status
test             (u-10-mschapv2)
...
qwert12345       (u-27-mschapv2)
26g 0:00:00:00 DONE (2014-07-02 01:21) 38.80g/s 4408Kp/s 4361Mc/s 4361MC/s baraka1900..sss1900
Use the "--show" option to display all of the cracked passwords reliably
Session completed

The trick format is 80x faster than the naïve version. The figure shown already compensates for the slower load time - it's not included in the figures.

magnumripper commented 10 years ago

Further tests with encodings and OpenMP are consistent with my result above. Using 8xOMP the naïve version is still 51x slower than the trick version running on just one core.

jfoug commented 10 years ago

Ok, here is what I get on my 64 bit ubuntu VM, different dictionary, about 1/2 sized (7.4 million words)

mschapv2-naive (note MUCH slower than on my 32 bit build, about 20x slower) 33g 0:00:06:40 37.19% (ETA: 06:58:28) 0.08232g/s 30717p/s 45315Kc/s 45315KC/s mathi07..likely 33g 0:00:07:03 0.07785g/s 30827p/s 45469Kc/s 45469KC/s alphaxi417..rock you Use the "--show" option to display all of the cracked passwords reliably Session aborted

mschapv2 completed. Speed is much better 35g 0:00:00:23 DONE (2014-07-02 06:50) 1.471g/s 1368Kp/s 2079Mc/s 2079MC/s eneasitaukie..401155 Use the "--show" option to display all of the cracked passwords reliably Session completed

So now I will search for WHY the mschapv2 is SOOOOO much slower on my 32 bit cygwin builds, AND why mschapv2-naive is SOOOO much slower on 64 bit linux builds ;)

magnumripper commented 10 years ago

The speeds you got for -naive on 32-bit are totally impossible. They indicate you were actually running the trick format.

magnumripper commented 10 years ago

I'm inclined to close this as PEBCAK. Can you reproduce it?

magnumripper commented 10 years ago

After loading hashes etc, Jumbo resets the clock after line 1600 of john.c. Could there be some problem with this under Cygwin? That would mean the very slow loading of the trick format is counted, so the measured speed looks very slow (well it is slow if you do count it in a short run - that's why we kept the naive version as an alternative).

frank-dittrich commented 10 years ago

Then you should see the difference when comparing --test=1 and --test=100 output.

jfoug commented 10 years ago

$ ./john -w=pw.dic -form=mschapv2-naive mschapv2_tst.in -pot=naive.pot > /dev/null
Press 'q' or Ctrl-C to abort, almost any other key for status
1500g 0:00:00:00 DONE (2014-09-16 06:14) 6410g/s 7658p/s 5870Kc/s 5870KC/s X-files__17..AB109
Use the "--show" option to display all of the cracked passwords reliably
Session completed

$ ./john -w=pw.dic -form=mschapv2 mschapv2_tst.in -pot=fast.pot > /dev/null
MSCHAPv2: Note: slow loading. For short runs, try --format=MSCHAPv2-naive
instead. That version loads faster but runs slower.
Press 'q' or Ctrl-C to abort, almost any other key for status
1500g 0:00:00:25 DONE (2014-09-16 06:15) 59.87g/s 81.42p/s 82608c/s 82608C/s dffffffffffff__11..AB357
Use the "--show" option to display all of the cracked passwords reliably
Session completed

These are 2 runs against the TS file. In the 2nd run, it took 20s to get to the "Press 'q' or Ctrl-C....." line. Then 25s to complete. For the naive run, it completed almost instantly. I just did the same test on my Unbuntu 64 machine, and same results. Naive was instant, 'fast' format drags along, and completes in 25s (after about a 20s load time).

ghost@ghostvb:/JtR/bleed/run$ ./john -w=/c/phpbb/johnripper/bleeding/run/pw.dic -form=mschapv2 /c/phpbb/johnripper/bleeding/run/mschapv2_tst.in -pot=fast.pot > /dev/null
MSCHAPv2: Note: slow loading. For short runs, try --format=MSCHAPv2-naive
instead. That version loads faster but runs slower.
Warning: no OpenMP support for this hash type, consider --fork=2
Press 'q' or Ctrl-C to abort, almost any other key for status
1500g 0:00:00:25 DONE (2014-09-16 06:43) 59.28g/s 80.63p/s 81801c/s 81801C/s dffffffffffff__11..AB357
Use the "--show" option to display all of the cracked passwords reliably
Session completed

ghost@ghostvb:/JtR/bleed/run$ ./john -w=/c/phpbb/johnripper/bleeding/run/pw.dic -form=mschapv2-naive /c/phpbb/johnripper/bleeding/run/mschapv2_tst.in -pot=naive.pot > /dev/null
Will run 2 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
1500g 0:00:00:00 DONE (2014-09-16 06:44) 2830g/s 10977p/s 16466Kc/s 16466KC/s Skippin� an�*..ERS__17
Use the "--show" option to display all of the cracked passwords reliably
Session completed

If I run these 2 formats without the redirection, the naive simply blinks the screen (it seems), and is done. The 'fast' version slowly scrolls through and finds the data.

Can you not replicate this behavior??

(Note, it looks like my ubunut was built with OMP, but the cygwin was not. The speed difference was MUCH more than 2 threads of OMP would account for)

jfoug commented 10 years ago

Frank, when it gets right down to it, the -test speeds do not mean dick. The only speed that really matters is running against data. Yes, the -naive format is slower (from 4x to 8x slower). But

kholia commented 10 years ago

"the -test speeds do not mean dick"

:+1:

For some reason, I have noticed that the "-test" results are overly optimistic for many of my formats.

magnumripper commented 10 years ago

Well the speed of a TS run does not mean dick either. We have many formats that test 100x slower when using OMP, whereas the real speed has improved with 8x.

In real life you would not have a short wordlist that cracked all hashes. To come anywhere near real life use, you could for example use the TS input file but exhaust a given keyspace. Here's is a very limited such test:

$ ../run/john ../test/mschapv2_tst.in -form:mschapv2-naive -inc:digits -max-len=7
Loaded 1500 password hashes with 1500 different salts (mschapv2-naive, MSCHAPv2 C/R [MD4 DES DES 128/128 AVX-16 naive])
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
1                (u-8-mschapv2)
12345            (u-28-mschapv2)
2g 0:00:01:31 DONE (2014-09-16 19:39) 0.02189g/s 121658p/s 182906Kc/s 182906KC/s 0766269..0769743
Use the "--show" option to display all of the cracked passwords reliably
Session completed

$ rm ../run/john.pot 
$ time ../run/john ../test/mschapv2_tst.in -form:mschapv2 -inc:digits -max-len=7
MSCHAPv2: Note: slow loading. For short runs, try --format=MSCHAPv2-naive
instead. That version loads faster but runs slower.
Loaded 1500 password hashes with 1500 different salts (MSCHAPv2, C/R [MD4 DES (ESS MD5) 128/128 AVX 12x])
Warning: no OpenMP support for this hash type, consider --fork=8
Press 'q' or Ctrl-C to abort, almost any other key for status
12345            (u-28-mschapv2)
1                (u-8-mschapv2)
2g 0:00:00:01 DONE (2014-09-16 19:39) 1.886g/s 10482Kp/s 16644Mc/s 16644MC/s 0769825..0769743
Use the "--show" option to display all of the cracked passwords reliably
Session completed

real    0m13.836s
user    0m13.788s
sys 0m0.035s

Wall-clock time for the last run was 13.8 seconds, but when disregarding load time it was just one second. Speed improvement was 86x disregarding load time. Even when counting load-time, it's about 6x. For longer runs, the load time quickly becomes negligible so we'd approach 86x.

magnumripper commented 10 years ago

BTW, that is ONE thread, 86x faster than EIGHT threads running naive...

jfoug commented 10 years ago

Then what is killing the TS speeds? Is there some cmp_exact that is slower than slow ?

magnumripper commented 10 years ago

Yeah cmp_exact() will do the whole job from scratch, apart from the NT hash.

claudioandre-br commented 10 years ago

anywhere near real life use, you could for example use the TS input file but exhaust a given keyspace

Is there a minimum length/time to use to get the 'real' speed? BTW: why there is no DONE for OpenCL formats? What is "N/A"?

-max-len=4

56g 0:00:05:00 DONE (2014-09-16 22:21) 0.1862g/s 36.95p/s 2069c/s 53375C/s 941..97
56g 0:00:00:31  1.753g/s 347.8p/s 19479c/s 521759C/s 123..97
56g 0:00:00:16  3.437g/s 682.0p/s 38192c/s 1023KC/s 123..97

-max-len=5

[...]
56g 0:00:04:41 N/A 0.1987g/s 394.2p/s 22077c/s 582311C/s 71203..97
56g 0:00:01:49  0.5136g/s 1019p/s 57068c/s 1528KC/s 12345..97

../run/john ../tests/SHA256crypt_tst.in -form:sha256crypt-opencl -inc:digits -max-len=X -dev=Y

magnumripper commented 10 years ago

Is there a minimum length/time to use to get the 'real' speed?

There's no one-size-fits-all. Basically I would say the real speed is the speed you get while not cracking anything and disregarding initialization.

BTW: why there is no DONE for OpenCL formats? What is "N/A"?

I believe both of those issues are specific to running Incremental under --fork (or MPI). It's because Incremental mode does not split in equal slices.

jfoug commented 10 years ago

Found the problem, and it is an easy fix.

The problem is the last instruction in cmp_all. The call to binary(). This 'may' have been needed in the past, but since I have normalized the source, it is not needed. I have a change that I will post here, to let people see. Now the TS snaps complete just like it does in the naive format. Also, the -inc:digits -max-len=7 still functions in a second. This simply keeps us from doing the super heavy lifting each time a password is cracked.

$ git diff ntlmv1_mschapv2_fmt_plug.c
diff --git a/src/ntlmv1_mschapv2_fmt_plug.c b/src/ntlmv1_mschapv2_fmt_plug.c
index f7fc598..0d33fad 100644
--- a/src/ntlmv1_mschapv2_fmt_plug.c
+++ b/src/ntlmv1_mschapv2_fmt_plug.c
@@ -1271,8 +1271,26 @@ static int cmp_exact(char *source, int index)
        DES_ecb_encrypt((DES_cblock*)challenge, (DES_cblock*)&binary[16],
                        &ks, DES_ENCRYPT);

-       return !memcmp(binary, ((char*)my->methods.binary(source)) + 2,
-                      FULL_BINARY_SIZE - 2);
+       {
+               // Since we have normalized source, there is NO reason to call
+               // binary() which is a VERY VERY VERY slow function. We simply
+               // compare what we have computed with the hex binary in the
+               // normalized source string.  This speeds up found factors
+               // quite a bit. It REALLY shows when running in the TS, where
+               // we have a large set of input and passwords to crack them all.
+               char *cp = source + 27; // skip:  $MSCHAPv2$hhhhhhhhhhhhhhhh$
+               for (i = 0; i < 24; ++i) {
+                       unsigned char c = (atoi16[ARCH_INDEX(*cp)] << 4) + atoi16[ARCH_INDEX(*(cp+1))];
+                       if (c != binary[i]) {
+                               return 0;
+                       }
+                       cp += 2;
+               }
+       }
+       return 1;
+
+       //return !memcmp(binary, ((char*)my->methods.binary(source)) + 2,
+       //               FULL_BINARY_SIZE - 2);
 }

 static int salt_hash(void *salt) { return *(ARCH_WORD_32*)salt & (SALT_HASH_SIZE - 1); }

Now a TS run look like:

$ ../run/john ../../../jtr-ts/mschapv2_tst.in -form=mschapv2 -w=../../../jtr-ts/pw.dic
MSCHAPv2: Note: slow loading. For short runs, try --format=MSCHAPv2-naive
instead. That version loads faster but runs slower.
Loaded 1500 password hashes with 1500 different salts (MSCHAPv2, C/R [MD4 DES (ESS MD5) 128/128 AVX 12x])
Press 'q' or Ctrl-C to abort, almost any other key for status
bert*ernie       (u-4-mschapv2)
johnny▒_3        (u-739-mschapv2)
. . . . . .
pentium__15      (u-1338-mschapv2)
start12__16      (u-1402-mschapv2)
1500g 0:00:00:00 DONE (2014-09-16 14:38) 60000g/s 81600p/s 82783Kc/s 82783KC/s dffffffffffff__11..AB357
Use the "--show" option to display all of the cracked passwords reliably
Session completed

which it should.

jfoug commented 10 years ago

That change took the TS run from 60g/s to 60000g/s

In real world tests, it likely will not make a 'huge' difference, other than possibly starting out, when LOTS of passwords are cracked on a huge DB. But the change does reduce about .015s per any crack (.015s seems to be about the cost of binary() overhead).

We could have also moved that logic from binary() to salt() (I think), and this would not have been a problem.

magnumripper commented 10 years ago

Good stuff. I believe NTLMv1 uses the very same function, so please verify the fix applies for it too. Especially the source + 27 might need to be tweaked.

jfoug commented 10 years ago

Only works (easily), if source is cannonical (it now is in mschapv2)

jfoug commented 10 years ago

Fixed in 916cb6b I will look into NTLMv1

jfoug commented 10 years ago

Needs more work. I now see what you meant by ntlmv1 using the same function, lol. I will get it working shortly.

magnumripper commented 10 years ago

I think ntlmv1 is canonical already. But it's source + 26.

jfoug commented 10 years ago

4bcb4dc fixes both.

jfoug commented 10 years ago

No, ntlmv1 was source + variable.

I simply get source+11 then find the '$' then inc pointer 1 more byte. Then the length of that first hex string does not matter.

magnumripper commented 10 years ago

Excellent, so we can close this now?

openwall / john

MSCHAP-v2 is broken (slow) #686