openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.35k stars 2.11k forks source link

"Tunable cost", allow --test with --cost #816

Closed magnumripper closed 9 years ago

magnumripper commented 10 years ago

I'd really like this, for example benchmarking only the WPA test vector from the pbkdf2-hmac-sha1 format with --test --cost=4096. I had a look at bench.c (lines 182 and on) and saw now obvious way to implement it without rewriting more than I'm happy to. So I'll just let it mature in my head.

frank-dittrich commented 10 years ago

Yes, I also think this could be useful. --cost should be allowed for --test only together with --format= (but not with "generic" formats like dynamic or cpu), because not even all salted formats do have tunable costs, and different formats can report different tunable cost parameters.

I didn't look at the code yet, but couldn't you just replace the tests which don't fit the requested --cost settings with tests that do (provided that such tests exsist), similar to we do for some formats with --encoding, just not format specific, and exit with an error code if no such test exist...

magnumripper commented 10 years ago

That might be a good idea... So instead of hacking that loop, just add a section before it that shuffles the test vectors if applicable and possible. I like this because it separates the Jumbo hacks from the core code.

magnumripper commented 10 years ago

We should also list min and max costs used for benchmarking, if --cost was used or verbosity was bumped. This would make it easy to see why a format like Office performs better or worse than expected.

magnumripper commented 10 years ago

I had a look three times now but I give up :) @frank-dittrich I leave it to you, whenever you get the time and inspiration.

frank-dittrich commented 10 years ago

@magnumripper, with increased verbosity we might even want to report tunable costs in regular cracking sessions if all salts have the same tunable cost values.

magnumripper commented 10 years ago

Definitely. I'd say at verbosity>3 already.

frank-dittrich commented 10 years ago

On 10/25/2014 06:09 PM, magnum wrote:

I had a look three times now but I give up :) @frank-dittrich https://github.com/frank-dittrich I leave it to you, whenever you get the time and inspiration.

Why would you give up? Is the code in question too complicated / confusing? Should it be restructured in some way, or some helpful comments be added?

magnumripper commented 10 years ago

Nothing wrong with the code, but it's always hard to "digest" other people's structs. I can do it for sure, but it will need to be on The Right Day. My IQ seem to fluctuate wildly over time :-)

A hack like the "pot sync" took me many many tries before I got something to even keep and improve on. Sometimes I see old commits of mine and have absolutely no idea how I could grasp it at the time.

magnumripper commented 10 years ago

Also, in bench.c nothing is "normal". We don't have a database so everything needs to be done bare handed.

magnumripper commented 9 years ago

This is implemented but I might clean up the code a bit. Now, for example, we can see the individual speeds for different test vectors in PBKDF2-HMAC-SHA1:

$ ../run/john -test -form:pbkdf2-hmac-sha1 -cost=1000:1000
Will run 8 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (8xOMP) DONE
Raw:    43323 c/s real, 5653 c/s virtual

$ ../run/john -test -form:pbkdf2-hmac-sha1 -cost=10000:10000
Will run 8 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (8xOMP) DONE
Raw:    4380 c/s real, 565 c/s virtual

$ ../run/john -test -form:pbkdf2-hmac-sha1 -cost=4096:4096
Will run 8 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (8xOMP) DONE
Raw:    10593 c/s real, 1385 c/s virtual

That last one using 4096 (a WPA-PSK key from a config file) should be more than double the speed of sniffed WPA-PSK.

$ ../run/john -test -form:wpapsk           
Will run 8 OpenMP threads
Benchmarking: wpapsk, WPA/WPA2 PSK [PBKDF2-SHA1 128/128 AVX 8x]... (8xOMP) DONE
Raw:    4864 c/s real, 758 c/s virtual

It's more than 2x due to half the number of iterations (we can early reject after creating half of the key) and less post processing.

magnumripper commented 9 years ago

I never reflected over this until now but IMHO just saying -cost=4096 should infer "4096:4096" so a range would always need to be specified. For --salts it's another thing but for iterations I think that would have been better. Probably too late to change though.

magnumripper commented 9 years ago

I would like to print what counts are actually used (like in normal cracking) but that would come between the "(8xOMP)" and "DONE" if I just printed it in that code block. Maybe it should be like these mockups:

$ ../run/john -test -form:pbkdf2-hmac-sha1
Will run 8 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (8xOMP) DONE
Testing only cost 1 (iteration count) of 1000
Raw:    43323 c/s real, 5653 c/s virtual

$ ../run/john -test -form:pbkdf2-hmac-sha1 -cost=1001     
Will run 8 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (8xOMP) DONE
Testing only cost 1 (iteration count) of 4096 and 10000
Raw:    6159 c/s real, 807 c/s virtual

$ ../run/john -test -form:pbkdf2-hmac-sha1 -cost=4096:4096
Will run 8 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (8xOMP) DONE
Testing only cost 1 (iteration count) of 4096
Raw:    10593 c/s real, 1385 c/s virtual

$ ../run/john -test -form:pbkdf2-hmac-sha1 -cost=5000
Will run 8 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (8xOMP) DONE
Testing only cost 1 (iteration count) of 10000
Raw:    4380 c/s real, 565 c/s virtual

For --test we should print this regardless of verbosity (in real crack runs, it's only printed if --cost was specified or verbosity was bumped from default).

frank-dittrich commented 9 years ago

https://github.com/magnumripper/JohnTheRipper/issues/816#issuecomment-69296175

I never reflected over this until now but IMHO just saying -cost=4096 should infer "4096:4096" so a range would always need to be specified. For --salts it's another thing but for iterations I think that would have been better. Probably too late to change though.

Solar wanted me to implement it exactly like for --salts=, meaning: a single value means >=. The MIN:MAX is something I borrowed from --markov.

magnumripper commented 9 years ago

Here's the output now:

$ ../run/john -test -form:office 
Will run 8 OpenMP threads
Benchmarking: Office, 2007/2010 (SHA-1) / 2013 (SHA-512), with AES [32/64 OpenSSL]... (8xOMP) DONE
Tested only cost 1 (MS Office version) of 2007
Tested only cost 2 (iteration count) of 50000
Raw:    544 c/s real, 74.8 c/s virtual

$ ../run/john -test -form:office -cost=2010:2010
Will run 8 OpenMP threads
Benchmarking: Office, 2007/2010 (SHA-1) / 2013 (SHA-512), with AES [32/64 OpenSSL]... (8xOMP) DONE
Tested only cost 1 (MS Office version) of 2010
Tested only cost 2 (iteration count) of 100000
Raw:    271 c/s real, 37.8 c/s virtual

$ ../run/john -test -form:office -cost=2013:2013
Will run 8 OpenMP threads
Benchmarking: Office, 2007/2010 (SHA-1) / 2013 (SHA-512), with AES [32/64 OpenSSL]... (8xOMP) DONE
Tested only cost 1 (MS Office version) of 2013
Tested only cost 2 (iteration count) of 100000
Raw:    90.5 c/s real, 11.8 c/s virtual
magnumripper commented 9 years ago

Changed output a little.

$ ../run/john -test -form:office -cost=2013:2013
Will run 8 OpenMP threads
Benchmarking: Office, 2007/2010 (SHA-1) / 2013 (SHA-512), with AES [32/64 OpenSSL]... (8xOMP) DONE
Speed for cost 1 (MS Office version) of 2013, cost 2 (iteration count) of 100000
Raw:    90.5 c/s real, 12.1 c/s virtual

First, the "tested only" was a bit misleading since we normally test all costs but benchmark just one or two. Although when using --cost you will actually also only test the specified cost(s) - this is intended, for devel reasons. Second, I put the costs on one line for a little better looking output.

jfoug commented 9 years ago

I never reflected over this until now but IMHO just saying -cost=4096 should infer "4096:4096" so a range would always need to be specified. For --salts it's another thing but for iterations I think that would have been better. Probably too late to change though.

Should it infer "4096:4096" or "0:4096" ?

frank-dittrich commented 9 years ago

Instead of 0:4096 you can use -4097. This should work.

magnumripper commented 9 years ago

Should it infer "4096:4096" or "0:4096" ?

IMO the most intuitive would be -cost=4096 resulting in only that exact cost loaded. Not a big deal though. If Solar was specific about it, I'll keep my big mouth shut.

jfoug commented 9 years ago

The bad thing about this line of thinking, is that -salt is exactly opposite in design as this one.

-salt=v meaning >=v is due to higher values being better performance. -cost=v meaning >= v is opposite, as it means everything at or slower to v (I think all iteration costs would be slower with larger number, correct?)

So for the intent to be the same, -salt=v --> >=v and -cost=v --> <=v do the same thing, they only deal with data at a specific level or 'faster'.

But like you said, Solar's call. There are ways to specify exact meaning, so that is all that really is needed.

frank-dittrich commented 9 years ago

May be I mis-interpreted him, or interpreted too much into his words. However, it is bad timing. --cost= existed long enough before J1 was released. So, there would have been plenty of time to change the semantics. The longer a J2 release takes, the worse the impact of changing semantics. Whatever we do, it must also be possible to specify cost ranges.

BTW: specifying cost ranges doesn't work well with --test.

magnumripper commented 9 years ago

BTW: specifying cost ranges doesn't work well with --test.

The ranges themselves work as they should, the problem is the benchmark code only uses the first two test vectors that fit. For eg. pbkdf2-hmac-sha1, this means only 1000 rounds are benchmarked without cost, or if -cost is used with any range that involved 1000. But there is only one vector for 4096 so if you say -cost=4096 it will benchmark it and one of the 10,000 iteration vectors. If you say 4096:4096 it will use the one vector for 4096 twice - so that does work properly.

jfoug commented 9 years ago
$ ../run/john -test=0 -form=cpu
....
Testing: dynamic_2010 [md5($s.md5($s.$p)) (PW > 32 or salt > 23 bytes, sse2) 128/128 AVX 480x4x3]... (8xOMP) PASS
Testing: dynamic_2011 [md5($s.md5($p.$s)) (PW > 32 or salt > 23 bytes, sse2) 128/128 AVX 480x4x3]... (8xOMP) PASS
Testing: dynamic_2014 [md5($s.md5($p).$s) (PW > 55 or salt > 11 bytes, sse2) 128/128 AVX 480x4x3]... (8xOMP) PASS
Testing: agilekeychain, 1Password Agile Keychain [PBKDF2-SHA1 AES 8x SSE2]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 1000
Testing: aix-ssha1, AIX LPA {ssha1} [PBKDF2-SHA1 8x SSE2]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 64
Testing: aix-ssha256, AIX LPA {ssha256} [PBKDF2-SHA256 128/128 SSE4.1 4x]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 64
Testing: aix-ssha512, AIX LPA {ssha512} [PBKDF2-SHA512 128/128 SSE4.1 2x]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 64
Testing: asa-md5, Cisco ASA [Cisco ASA (MD5 salted) 128/128 AVX 480x4x3]... (8xOMP) PASS
Testing: bfegg, Eggdrop [Blowfish 32/64]... (8xOMP) PASS
Testing: Bitcoin [SHA512 AES 128/128 SSE4.1 2x]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 177864 and 258507
Testing: blackberry-es10 [101x SHA-512]... (8xOMP) PASS
Testing: WoWSRP, Battlenet [SHA1 32/64 GMP-exp]... (8xOMP) PASS
Testing: Blockchain, My Wallet (x10) [PBKDF2-SHA1 AES 8x SSE2]... (8xOMP) PASS
Testing: chap, iSCSI CHAP authentication [MD5 32/64]... (8xOMP) PASS
Testing: Clipperz, SRP [SHA256 32/64 GMP-exp]... (8xOMP) PASS
Testing: cloudkeychain, 1Password Cloud Keychain [PBKDF2-SHA512  128/128 SSE4.1 2x]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 227272
Testing: cq, ClearQuest [CQWeb]... (8xOMP) PASS
Testing: CRC32 [CRC32 32/64]... (8xOMP) PASS
Testing: sha1crypt, NetBSD's sha1crypt [PBKDF1-SHA1 8x SSE2]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 64000 and 40000
Testing: sha256crypt, crypt(3) $5$ (rounds=5000) [SHA256 32/64 OpenSSL]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 5000
Testing: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 64/64 OpenSSL]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 5000
Testing: Citrix_NS10, Netscaler 10 [SHA1 128/128 AVX 8x]... (8xOMP) PASS
Testing: dahua, "MD5 based authentication" Dahua [MD5 32/64]... (8xOMP) PASS
Testing: Django (x10000) [PBKDF2-SHA256 128/128 SSE4.1 4x]... (8xOMP) PASS
Speed for cost 1 (iteration count) of 10000
Testing: django-scrypt [Salsa20/8 128/128 AVX]... (8xOMP) PASS
Speed for cost 1 (N) of 14, cost 2 (r) of 8, cost 3 (p) of 1
Testing: dmd5, DIGEST-MD5 C/R [MD5 32/64]... (8xOMP) PASS

Note the scattering of "Speed for cost 1 (interation count) .... stuff. There was NO cost asked for. why are these being output??

magnumripper commented 9 years ago

Ah... I never tried it with --test=0

$ ../run/john -test -form:*office  
Benchmarking: Office, 2007/2010 (SHA-1) / 2013 (SHA-512), with AES [32/64 OpenSSL]... DONE
Speed for cost 1 (MS Office version) of 2007, cost 2 (iteration count) of 50000
Raw:    155 c/s real, 155 c/s virtual

Benchmarking: oldoffice, MS Office <= 2003 [MD5/SHA1 RC4 32/64]... DONE
Speed for cost 1 (hash type) of 1 and 0
Many salts: 626547 c/s real, 626547 c/s virtual
Only one salt:  614292 c/s real, 608209 c/s virtual
All 2 formats passed self-tests!

The above makes sense

$ ../run/john -test=0 -form:*office
Testing: Office, 2007/2010 (SHA-1) / 2013 (SHA-512), with AES [32/64 OpenSSL]... PASS
Speed for cost 1 (MS Office version) of 2007, cost 2 (iteration count) of 50000
Testing: oldoffice, MS Office <= 2003 [MD5/SHA1 RC4 32/64]... PASS
Speed for cost 1 (hash type) of 1 and 0
All 2 formats passed self-tests!

This doesn't. Maybe we should just mute it for --test=0.

magnumripper commented 9 years ago

There was NO cost asked for. why are these being output?

Because now, for the first time, you can actually know what is actually benchmarked in a mix format. I like it this way (except for --test=0) but we could opt to only print it if --verbosity is above 3 (and still mute it completely for -test=0). What do you guys think?

frank-dittrich commented 9 years ago

Related:

(bleeding-jumbo)run $ ./john --test --format=raw-md5 --costs=1,2,3,4
max. 3 different tunable cost parameters supported
(bleeding-jumbo)run $ ./john --test --format=md5crypt --costs=1,2,3,4
max. 3 different tunable cost parameters supported
(bleeding-jumbo)run $ ./john --test --format=md5crypt --costs=1,2,3  
Will run 4 OpenMP threads
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 12x]... (4xOMP) DONE
Raw:    32256 c/s real, 8064 c/s virtual

John complains if you specify more than FMT_TUNABLE_COSTS cost values, but not if you specify more cost falues then there are tunable costs for a format.

Also:

(bleeding-jumbo)run $ ./john --test --costs=-3
Will run 4 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 AVX-16]... (4xOMP) DONE
Many salts: 9568K c/s real, 2392K c/s virtual
Only one salt:  6225K c/s real, 1556K c/s virtual

Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AVX-16]... (4xOMP) DONE
Many salts: 276784 c/s real, 69713 c/s virtual
Only one salt:  247808 c/s real, 61797 c/s virtual

Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 12x]... (4xOMP) DONE
Raw:    32832 c/s real, 8187 c/s virtual

Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X3]... (4xOMP) FAILED (no data)
Benchmarking: scrypt (16384, 8, 1) [Salsa20/8 128/128 AVX]... (4xOMP) FAILED (no data)
Benchmarking: LM [DES 128/128 AVX-16]... (4xOMP) DONE
Raw:    16646K c/s real, 4171K c/s virtual

Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]... DONE
Short:  264704 c/s real, 264704 c/s virtual
Long:   727296 c/s real, 727296 c/s virtual

Benchmarking: tripcode [DES 128/128 AVX-16]... (4xOMP) DONE
Raw:    4963K c/s real, 1420K c/s virtual

Benchmarking: dummy [N/A]... DONE
Raw:    17984K c/s real, 17984K c/s virtual

Benchmarking: dynamic_0 [md5($p) (raw-md5) 128/128 AVX 480x4x3]... (4xOMP) ^CWait...
Session aborted

With --test, but wiithout --format, it shouldn't be allowed to use --costs=. Similarly, it shouldn't be allowed to use --cost, if you specify a generic format name like cpu, gpu. dynamic, sap*.

magnumripper commented 9 years ago

Related:

(bleeding-jumbo)run $ ./john --test --format=raw-md5 --costs=1,2,3,4
max. 3 different tunable cost parameters supported
(bleeding-jumbo)run $ ./john --test --format=md5crypt --costs=1,2,3,4
max. 3 different tunable cost parameters supported
(bleeding-jumbo)run $ ./john --test --format=md5crypt --costs=1,2,3  
Will run 4 OpenMP threads
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 12x]... (4xOMP) DONE
Raw:    32256 c/s real, 8064 c/s virtual

John complains if you specify more than FMT_TUNABLE_COSTS cost values, but not if you specify more cost falues then there are tunable costs for a format.

I can fix this for --test, but you should fix it for non-test runs:

$ ../run/john -form:raw-md5 input -w -cost=1
Loaded 1 password hash (Raw-MD5 [MD5 128/128 AVX 12x])
No password hashes left to crack (see FAQ)

$ ../run/john -form:raw-md5 input -w -cost=1,2,3
Loaded 1 password hash (Raw-MD5 [MD5 128/128 AVX 12x])
No password hashes left to crack (see FAQ)
magnumripper commented 9 years ago

All issues fixed.

Similarly, it shouldn't be allowed to use --cost, if you specify a generic format name like cpu, gpu. dynamic, sap*.

I disagree. We should not overdo this. The new error messages should be descriptive enough. A myriad of obscure edge case support fixes might introduce bugs. Also, time is better spent writing actual enhancements. And for wildcards, they might be relevant with cost, eg john -test -form:aix-ssha* -cost=64:64