openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.14k stars 2.08k forks source link

OpenCL version of PKCS #12 KDF #2176

Closed kholia closed 7 years ago

kholia commented 8 years ago

PKCS#12 KDF resides in the "pkcs12_plug.c" file.

It would be great to have a SIMD and OpenCL version of it.

jfoug commented 8 years ago

Should pfx_fmt_plug.c be redone using the pkcs12_plug.c functions, and rip it off of the oSSL / algorithm anchor?

kholia commented 8 years ago

We already have a new pfx_ng_fmt_plug.c file which uses the new PKCS#12 code. We can simply drop the old pfx_fmt_plug.c file when the new format is feature complete (and fast).

jfoug commented 8 years ago

To do SIMD, we will need to do this like we do the PBKDF2 stuff, where each format will need to handle loading proper arrays of input, then calling the PKCS#12 functions. Should be doable for sure.

jfoug commented 8 years ago

NOTE:

int mbedtls_pkcs12_derivation( unsigned char *data, size_t datalen, const
        unsigned char *pwd, size_t pwdlen, const unsigned char *salt,
        size_t saltlen, int md_type, int id, int iterations )
{
    unsigned int j;

    unsigned char diversifier[128];
    unsigned char salt_block[128], pwd_block[128], hash_block[128];
    unsigned char hash_output[1024];
    unsigned char *p;
    unsigned char c;

    size_t hlen, use_len, v, i;

    SHA_CTX md_ctx;

    // This version only allows max of 64 bytes of password or salt
    if( datalen > 128 || pwdlen > 64 || saltlen > 64 )
        return -1; // MBEDTLS_ERR_PKCS12_BAD_INPUT_DATA

    hlen = 20; // for SHA1

    if( hlen <= 32 )
        v = 64;
    else
        v = 128;

Since we fail based on pwlen, which is (plaintext_len << 1) + 2 should we be limiting plaintext length of all formats using this code to 31 bytes ???

Question is for @kholia

jfoug commented 8 years ago
$ ../run/john -test -form=bks
Will run 8 OpenMP threads
Benchmarking: BKS [PKCS12 PBE SHA-1 32/64]... (8xOMP) DONE
Raw:    12331 c/s real, 1892 c/s virtual

$ ../run/john -test -form=bks
Will run 8 OpenMP threads
Benchmarking: BKS [PKCS12 PBE 128/128 AVX 4x]... (8xOMP) DONE
Raw:    29545 c/s real, 7773 c/s virtual

Only 2.4 now (still POC). Note, it is not 100% SIMD, I only SIMD the iteration block. BUT this is only 4x SIMD. We should see a 5x improvement for 8x SIMD chips (with the current POC code).

$ ../run/john -test -form=pfx-ng
Will run 8 OpenMP threads
Benchmarking: pfx-ng [PKCS12 PBE (.pfx, .p12) (SHA-1 to SHA-512) 32/64]... (8xOMP) DONE
Raw:    15860 c/s real, 2344 c/s virtual

$ ../run/john -test -form=pfx
Will run 8 OpenMP threads
Benchmarking: PFX, PKCS12 (.pfx, .p12) [32/64]... (8xOMP) DONE
Raw:    16659 c/s real, 5410 c/s virtual

$ ../run/john -test -form=pfx-ng
Will run 8 OpenMP threads
Benchmarking: pfx-ng [PKCS12 PBE (.pfx, .p12) (SHA-1 to SHA-512) 128/128 AVX 4x]... (8xOMP) DONE
Raw:    55986 c/s real, 7908 c/s virtual

That looks a bit better ;) Looks like the BKS has a lot of data that is done in the final hmac. We do not have SIMD code for hmac (but might be able to steal logic from pbkdf2-hmac-sha1.h file).

jfoug commented 8 years ago

Ok, getting ready to check this in. Now the speeds are 'better'. I also have 1 line in pkcs12.h which we can set to #if 1 and build in non-SIMD mode, for easier testing.

$ ../run/john -test -form=bks
Will run 8 OpenMP threads
Benchmarking: BKS [PKCS12 PBE 32/64]... (8xOMP) DONE
Raw:    11663 c/s real, 1723 c/s virtual

$ ../run/john -test -form=pfx-ng
Will run 8 OpenMP threads
Benchmarking: pfx-ng [PKCS12 PBE (.pfx, .p12) (SHA-1 to SHA-512) 32/64]... (8xOMP) DONE
Raw:    13807 c/s real, 2116 c/s virtual

$ vi pkcs12.h

$ make -s
Make process completed.

$ ../run/john -test -form=bks
Will run 8 OpenMP threads
Benchmarking: BKS [PKCS12 PBE 128/128 AVX 4x]... (8xOMP) DONE
Raw:    49399 c/s real, 6805 c/s virtual

$ ../run/john -test -form=pfx-ng
Will run 8 OpenMP threads
Benchmarking: pfx-ng [PKCS12 PBE (.pfx, .p12) (SHA-1 to SHA-512) 128/128 AVX 4x]... (8xOMP) DONE
Raw:    57510 c/s real, 8465 c/s virtual

Both are about 4.1-4.2x faster on my AVX (SIMD-4x) so being > 4x makes me smile a bit ;)

jfoug commented 8 years ago

Gonna just directly check in the SIMD changes. This is NOT opencl, someone else can play with that one ;)

jfoug commented 8 years ago

NOTE, I set Plaintext length to 31 for both of these formats (until I hear otherwise).

jfoug commented 8 years ago

@kholia please test on your AVX2, to make sure I did not bone anything. I have only tested on AVX (SIMD-4x).

jfoug commented 8 years ago

Does PKCS12 also use other hashing algo? I am surprised that only sha1 is supported, and would think all (or many) SHA2 hashes should also be there.

kholia commented 8 years ago

Before SIMD,

$ ../run/john --format=pfx-ng --test
Will run 4 OpenMP threads
Benchmarking: pfx-ng [PKCS12 PBE (.pfx, .p12) (SHA-1 to SHA-512) 32/64]... (4xOMP) DONE
Raw:    11080 c/s real, 2797 c/s virtual

After SIMD changes,

$ ../run/john --format=pfx-ng --test
Will run 4 OpenMP threads
Benchmarking: pfx-ng [PKCS12 PBE (.pfx, .p12) (SHA-1 to SHA-512) 256/256 AVX2 8x]... (4xOMP) DONE
Raw:    62816 c/s real, 15743 c/s virtual

Speed-up is around 5.6x.

kholia commented 8 years ago

Does PKCS12 also use other hashing algo? I am surprised that only sha1 is supported, and would think all (or many) SHA2 hashes should also be there.

Yes, other hashes are supported too. See /run/pfxng2john.py for a list. Various SHA2 hashes are there.

I should have named mbedtls_pkcs12_derivation as mbedtls_pkcs12_derivation_sha1 in the first place.

jfoug commented 8 years ago

do you have the specific differences. I would like to take a shot at getting at least non-SIMD of all hashes. I see:

sha1, 224, 256, 384, 512. Also 512_224/512_256 which I am not sure john handles. But I REALLY think we should target 256 and 512 out of the gate. Yes, there 'are' others, but I think sha1 (default), along with the 'normal' sha2 hashes should be what we get working now.

kholia commented 8 years ago

Since we fail based on pwlen, which is (plaintext_len << 1) + 2 should we be limiting plaintext length of all formats using this code to 31 bytes ?

I am not sure about this one. The "pwdlen > 64" check in mbedtls_pkcs12_derivation seems to be a limitation of this particular implementation.

https://github.com/doublereedkurt/pyjks/blob/master/jks/rfc7292.py#L21 has another implementation of PKCS#12 key derivation.

For now, setting PLAINTEXT_LENGTH to 32 is OK I think.

jfoug commented 8 years ago

32 would not work, since the UTF16 null looks like it is required.

But now that I have figure out how to generate these things, I can build hashes with longer passwords, to see just what would crack. I see no reason why it should not 'work', but i really do not understand the algo 'yet'

kholia commented 8 years ago

I can attempt getting sha256 support working. Give me a day or two.

32 was a typo in my last comment, I meant 31 :-)

kholia commented 8 years ago
Generating a PKCS#12 Private Key and Public Certificate
=======================================================

1. Generate an RSA private key

openssl genrsa -out openwall.key 1024

2. Generate a Certificate Signing Request

openssl req -new -key openwall.key -out openwall.csr

3. Generate a self-signed public certificate based on the request

openssl x509 -req -days 3650 -in openwall.csr -signkey openwall.key -out openwall.crt

4. Generate a PKCS#12 file

openssl pkcs12 -keypbe PBE-SHA1-3DES -certpbe PBE-SHA1-3DES -export -in openwall.crt -inkey openwall.key -out openwall.pfx -name "openwall"

$ openssl pkcs12 -keypbe PBE-SHA1-3DES -macalg sha256 -certpbe PBE-SHA1-3DES -export -in openwall.crt -inkey openwall.key -out test12345.pfx -name "test12345" # for SHA256

You can use this information to generate .pfx files for testing.

jfoug commented 8 years ago

NOTE, it appears that 30 char pw is 'max' for the current code (sha1) for pfx-ng I built a 30 char and a 31 char, and only the 30 char can be found.

jfoug commented 8 years ago

new issue added #2183 (original pfx format does not have this problem)

jfoug commented 8 years ago

Sha256 SIMD added (along with cost stuff since we now have algo costs). Only saw 2.5x improvement for SHA256, but i think the SIMD improvements are a bit less for that hash. Also, the hmac being done in oSSL for sha256, 'may' have a larger impact on overall speed.

sha256 also seems to have 30 byte max password length for pfx-ng

jfoug commented 8 years ago

NOTE, pfx format also is only finding 30 byte passwords. I wonder if our *2john are right? Possibly, 30 character is MAX possible??? (My bad, that was ng)

jfoug commented 8 years ago

@kholia I already have sha512 in there, BUT I have to find/fix a problem with SHA256 simd code (I think I know the issue). Once I get sha256 figured out, the 512 code should be trivial (the oSSL version / pfxng2john.py code already is working fine).

kholia commented 8 years ago

The actual length limit of 30 needs further investigation. This said, 30 should not be a problem in practice.

jfoug commented 8 years ago

Ok, all 3 hashes (sha1, sha256, sha512), for SIMD (at least) only detect 30, and not 31:

Using default input encoding: UTF-8
Loaded 6 password hashes with 6 different salts (pfx-ng [PKCS12 PBE (.pfx, .p12) (SHA-1 to SHA-512) 128/128 XOP 4x])
Loaded hashes with cost 1 (mac-type) varying from 1 to 512
Will run 2 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
123456789012345678901234567890 (sha256-30.pfx)
123456789012345678901234567890 (sha512-30.pfx)
123456789012345678901234567890 (sha1-30.pfx)
3g 0:00:00:00 DONE (2016-07-27 10:54) 11.95g/s 59.76p/s 274.9c/s 274.9C/s a..12345678901234567890
Use the "--show" option to display all of the cracked passwords reliably
Session completed

I will check to make sure non-SIMD has same limitation, but I would bet it does (and will only post a follow up if I find otherwise)

jfoug commented 8 years ago

It should be easy to increase that count, but we will probably have to dig into oSSL source, and see just how the data is used. Likely it will simply be within the pkcs12_fill_buffer function(s)

jfoug commented 8 years ago

Ok, I am able to get 31 byte passwords to work. BUT to do that, I have to set PLAINTEXT_LENGTH to 32 ?!?!?! Yes, we do need to figure this out. It really should NOT be an issue to make the password length longer. The encryption of the password seems to play very little in overall speed. It simply preloads the hash contexts. The main loop is just re-encrypting the prior steps, thus the speed will be constant.

jfoug commented 8 years ago

I have been able to increase PLAINTEXT_LENGTH up (at least on sha1). I think SHA256 would be the same. I am surprised that > 31 for SHA512 was not working (it may have been already.

However, it appears there is some max length WITHIN the openssl process on key length. We should find that out, and make sure that we can at least handle up to that size. (it is less than 60 I think).

jfoug commented 8 years ago

Ok, this DEBUG_KEYGEN build is oSSL shows (I think) the limit. The password is 1234567890123.....90 (50 bytes long.) But look at the 'last' password length (only 100). The last '0' character was truncated off, so only a 49 byte password was used, AND cracks the hash if used.

NOTE, this may also simply be a bug in the oSSL version I am building against (1.0.2d)

...
Password (length 102):
003100320033003400350036003700380039003000310032003300340035003600370038003900300031003200330034003500360037003800390030003100320033003400350036003700380039003000310032003300340035003600370038003900300000
Salt (length 8):
873C6DEB148CFB1E
Output KEY (length 8)
A3A80D046B9E4EC5
KEYGEN DEBUG
ID 1, ITER 2048
Password (length 102):
003100320033003400350036003700380039003000310032003300340035003600370038003900300031003200330034003500360037003800390030003100320033003400350036003700380039003000310032003300340035003600370038003900300000
Salt (length 8):
141846579E42A530
Output KEY (length 24)
C072036CF5FCEC196518EF0FD0A5F4A97F928D5B78FD2B90
KEYGEN DEBUG
ID 2, ITER 2048
Password (length 102):
003100320033003400350036003700380039003000310032003300340035003600370038003900300031003200330034003500360037003800390030003100320033003400350036003700380039003000310032003300340035003600370038003900300000
Salt (length 8):
141846579E42A530
Output KEY (length 8)
8708CF9E72DF6EF5
KEYGEN DEBUG
ID 3, ITER 2048
Password (length 100):
00310032003300340035003600370038003900300031003200330034003500360037003800390030003100320033003400350036003700380039003000310032003300340035003600370038003900300031003200330034003500360037003800390000
Salt (length 8):
AB4BEC920EB254ED
Output KEY (length 20)
CCB8176E3C77DBC121963BC0CAB87FD0C46514D9

The final 'key' CCB8...14D9 is the key generated by mbedtls_pkcs12_derivation() if we use the first 49 bytes of that number password.

jfoug commented 8 years ago

sha224/sha384 added (non-simd). I am not sure the ROI on SIMD for those hashes are worthwhile. Likely they will not be seen ITW.

Also, that python lib only does sha1, 224, 256, 384 and 512. oSSL does Whirlpool, sha0, md5, md4, mdc2 and likely others, but those were the only ones macalgo's I could get oSSL to do, there were others documented ,but my cygwin ossl would not do them (like RIPEMD160, blake2, etc). I think if we want 'full' PCSS#12 support, we will need to write our own .c pfgng2john.c conversion program, not using python. But again, is that worth the ROI?

kholia commented 8 years ago

No, I don't think that full PKCS#12 support is worth the effort. I think that the existing Python script can easily support all the possible mac algorithms.

jfoug commented 8 years ago

The asn1crypto lib does macalgo of sha1, sha224-512 only. But that likely can be 'deemed' as full enough support.

jfoug commented 8 years ago

PLAINTEXT_LENGTH bumped up to 48 for all formats. From my usage of openssl, this seems like the max it will do anyway. If you add a 49 byte pw, ossl trims it to 48. Also, if you add a longer password, ossl bails saying no password entered. (likely a bug in ossl, but I do not care). 48 byte password checking is long enough.

jfoug commented 8 years ago

The SIMD code is done for now. All sha1 and sha2 are handled by the format. SIMD code is there for all hashes except the textbook ones of sha224 and sha384 (which no one ITW will use for real hashes).

36a7264

kholia commented 7 years ago

Jetico BestCrypt also makes use of this KDF. It would be nice to get an OpenCL port of this KDF.

magnumripper commented 7 years ago

What exactly do we need? Just saying "PKCS#12 KDF" is akin to saying "PBKDF2" without stating a PRNG.

kholia commented 7 years ago

Oh, right! Starting with the OpenCL port of PKCS#12 KDF with SHA-1 as the hashing function would be good.

kholia commented 7 years ago

This is partially done in https://github.com/magnumripper/JohnTheRipper/pull/2583.

kholia commented 7 years ago

@magnumripper Great work in commit 22eb8e345ebd1b715dbc62a3c1387db186a234ef 👍

I think that the license of src/opencl_pkcs12.h file needs to be the same as the pkcs12_plug.c one. I will send a pull request soon to fix this.

kholia commented 7 years ago

This issue can be closed now.