openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
9.64k stars 2.04k forks source link

Add support for QNX Neutrino password hashes #1970

Closed kholia closed 7 years ago

kholia commented 8 years ago

https://moar.so/blog/qnx-password-hash-formats.html

http://www.openwall.com/lists/john-users/2015/12/28/1

jfoug commented 8 years ago

This one will really need pw length sorting for any multi password attack mode (SIMD/GPU). The crypt is trivial (I have pass_gen done ,but am not getting matching sigs for sha512). But it is only a single large crypt of one long string. We will need to do something similar to what we did in cryptsha256, etc.

magnumripper commented 8 years ago

Perhaps we should incorporate the old reversible algorithm too, in unshadow!

jfoug commented 8 years ago

Can we get some 'known' sample hashes? Since the md5 and sha256 are working from https://moar.so/blog/qnx-password-hash-formats.html but the sha512 does not, it makes me much less confident that the hashes are actually 'correct'. It may be that only the sha512's are wrong, and the others ARE right, but we really should find out for sure.

magnumripper commented 8 years ago

Perhaps contact that guy at https://moar.so and see if he knows something of value.

jfoug commented 8 years ago

the way things read, it sounded like he was not able to get hashes to match either. Now, the md5/sha256 DO match. But sha512 must do something different. I have fck'd around with pass_gen quite a bit, and was not able to get matching hashes.

jfoug commented 8 years ago

I sent off an email, and received this back.

Hey Jim,

I think there's something special about the SHA-512 hashes in QNX. I had trouble trying to replicate their generation myself using Python's passlib and hashlib. I suspect that there's something tweaked or a mistake in their SHA-512 hash generation.

I created a new run of hashes for you to: https://moar.so/tmp/qnx_jtr_hashes.txt

I'll spend some time in the new year trying to reverse the passwd binary further and figure out what's different about SHA-512.

kk

The current qnx format cracks the md5 and sha256, and -form=descrypt cracks the 2 des hashes. But the sha512 are not cracked.

I am going to see if he can produce some sha512 hashes to some known words, just to see if the sha512 is 'compliant' or not.

jfoug commented 8 years ago

Hmm, it looks to me like QNX sha512 is broken. Look at this page (done by kk)

https://moar.so/tmp/qnx_sha512_broken.txt

So, the first 11 iterations (salt + 12 words) works. The total length of the string is 112 bytes. 112 bytes 'should' be the first hash which requires 2 limbs of sha512. It does work, but when another 'password' is appended (now 120 bytes), the hash returned is no longer correct.

To me, this appears to be a broken core crypto function within QNX.

jfoug commented 8 years ago

dbcd664 gets sha512 working with the same buggy code as is built into QNX. Note, this is only working on code using the 'generic' sha512 code in sha2.c

jfoug commented 8 years ago

I have 2 tuneable cost values in QNX. The algorithm, and the loop count. There is a 3rd tune cost, but I am not sure we can utilize it. That tuneable cost is the length of the candidate password.

So this hash: salt=abcd1234 pw=password rounds=1000 will end up running SHA512 over 8008 bytes of material. If we test with pw=pass, then it will only SHA512 over 4008 bytes.

I think one of the rar formats (or 7z???) is like this, where length of password makes huge difference in throughput.

frank-dittrich commented 8 years ago

Tunable costs are used to distinguish/group various hashes, e.g., to test "fast" hashes first, before spending time on hashes which are extremely hard to compute. So, unless you can see in the hash that a longer password has been used (which would be bad practice), the password length isn't a tunable cost.

I think there are formats that have a somewhat higher max. keys per crypt value, in order to collect "enough password candidates per basket".

jfoug commented 8 years ago

I have made some changes (not checked in yet), and now the sha256 is using openssl (or cryptlib), while only the sha512 is using the 'internal' generic sha512 code I wrote. Surpisingly, the sha256 hash is the slowest. Sha512 is faster, probably due to 1/2 the number of crypt calls. Using oSSL did speed up sha256 (about 15% in my case). But it is still slowest.

$ ../run/john -test=4 -form=qnx -cost=1000,5
Will run 2 OpenMP threads
Benchmarking: qnx, qnx hash (rounds=1000) [QNX 32/64 OpenSSL]... (2xOMP) DONE
Speed for cost 1 (iteration count) of 1000, cost 2 (algorithm (5=md5 256=sha256 512=sha512)) of 5
Raw:    27184 c/s real, 16161 c/s virtual

$ ../run/john -test=4 -form=qnx -cost=1000,256
Will run 2 OpenMP threads
Benchmarking: qnx, qnx hash (rounds=1000) [QNX 32/64 OpenSSL]... (2xOMP) DONE
Speed for cost 1 (iteration count) of 1000, cost 2 (algorithm (5=md5 256=sha256 512=sha512)) of 256
Raw:    17089 c/s real, 9553 c/s virtual

$ ../run/john -test=4 -form=qnx -cost=1000,512
Will run 2 OpenMP threads
Benchmarking: qnx, qnx hash (rounds=1000) [QNX 32/64 OpenSSL]... (2xOMP) DONE
Speed for cost 1 (iteration count) of 1000, cost 2 (algorithm (5=md5 256=sha256 512=sha512)) of 512
Raw:    20404 c/s real, 11699 c/s virtual
magnumripper commented 8 years ago

Yeah RAR3 is similar, it's 250K "update()" as opposed to 250K "final()". Can't really be used as a tunable cost.

Regarding SHA512 I always wondered why it's claimed to be about as fast as SHA256 while our raw-sha512 is much slower than raw-sha256 - until I realized they mean eg. doing a shasum of a sizable block of data. The SHA512 will need half as many calls to the actual compression function. Same thing here. Plus, we can only do half as many in parallel with SIMD.

jfoug commented 8 years ago

Yup, the parallel work does hurt. As does doing 128 bytes of hashing (raw formats) vs 64 bytes of hashing for the 32 bit formats.

But in THIS format, we do not get the 2x size penalty. Here, we hash 8024 bytes of data (1000x 8 byte pass with 16 byte salt). So md5/sha256/sha512 all have to compress the same amount of data, and we see that the sha512 performs as fast or a bit faster.

I have not started on SIMD just yet. This format may be an ugly one to get done optimally. But I really think it can be done pretty well.

magnumripper commented 8 years ago

With lengths sorted like in RAR3 I think it will be fine. Unlike shacrypt, (iirc) RAR3 doesn't try to collect lots of keys in order to get even distribution in the buckets - instead we're simply hoping that in a real crack run, lengths will be fairly well sorted as they come. Using mask mode, Markov or incremental, they are. Using wordlist they are not, unless the list is sorted by length. We might consider adding length sorting in the wordlist mode.

magnumripper commented 8 years ago

Coincidentally I noticed today that the RAR format was too ineffective when given various lengths (eg. -wordlist -rules) so I bumped the mkpc a lot in 3b71fc7a.

magnumripper commented 8 years ago

Wouldn't length sorting be pretty much needed for OpenMP also? Otherwise we'll have threads waiting with same effect as SIMD/OpenCL.

jfoug commented 8 years ago

I am not sure. I do not think that OMP will simply give 64 indexes to each thread, but if it does, then we will need to sort the data 'properly' to spread it out well. I do not know if it does this, OR if it will give each thread the next available index once the thread is complete. If that 2nd case is the case, then if we have some long passwords, and some shorter ones, then the threads that get the long one will be working longer on them, but another thread that has a shorter word will finish first, and get the next index. In the end, yes, there will be one thread working and the others waiting, BUT we have that situation every time we use OMP. The threads never end at same time. There will always be some waiting for all of them to complete.

jfoug commented 8 years ago

I guess we should research OMP techniques for non equivalent work within a for(x;x;x) loop

jfoug commented 8 years ago

This patch converts Digest::SHA::PurePerl into a module which correctly processes these QNX sha512 'broken' hashes. I think I am going to put this pm into unused, and add commented out logic into pass_gen.pl, for the qnx_sha512

--- SHA_pp.pm   2016-01-05 17:29:53.723149000 -0600
+++ SHA_qnx_pp.pm   2016-01-05 17:31:20.683236900 -0600
@@ -1,7 +1,11 @@
-package Digest::SHA::PurePerl;
+package SHA_qnx_pp;

 require 5.003000;

+#
+# Note, this has had sha512 'hacked' to have the same bug as seen in sha512 on QNX passwd function.
+#
+
 use strict;
 use warnings;
 use vars qw($VERSION @ISA @EXPORT @EXPORT_OK);
@@ -492,6 +496,7 @@ sub _shadirect {
    my $blockbytes = $self->{blocksize} >> 3;
    while ($bitcnt >= $self->{blocksize}) {
        &{$self->{sha}}($self, substr($bitstr, $offset, $blockbytes));
+       $self->{priorblock} = substr($bitstr, $offset, $blockbytes);
        $offset += $blockbytes;
        $bitcnt -= $self->{blocksize};
    }
@@ -512,6 +517,7 @@ sub _shabytes {
        $bitcnt -= $numbits;
        $bitstr = substr($bitstr, $numbits >> 3, _BYTECNT($bitcnt));
        &{$self->{sha}}($self, $self->{block});
+       $self->{priorblock} = $self->{block};
        $self->{block} = "";
        $self->{blockcnt} = 0;
        _shadirect($bitstr, $bitcnt, $self);
@@ -538,6 +544,7 @@ sub _shabits {
    return($savecnt) if $bitcnt < $gap;
    if ($self->{blockcnt} == $self->{blocksize}) {
        &{$self->{sha}}($self, $self->{block});
+       $self->{priorblock} = $self->{block};
        $self->{block} = "";
        $self->{blockcnt} = 0;
    }
@@ -627,6 +634,7 @@ sub _shafinish {
        }
        else {
            &{$self->{sha}}($self, $self->{block});
+           $self->{priorblock} = $self->{block};
            $self->{block} = "";
            $self->{blockcnt} = 0;
        }
@@ -640,6 +648,23 @@ sub _shafinish {
    }
    $self->{block} .= pack("N", $self->{lenlh} & $MAX32);
    $self->{block} .= pack("N", $self->{lenll} & $MAX32);
+   # here is the 'magic' code from the BROKEN QNX implementation
+   # NOTE, only sha512 should be used from this module. the other
+   # magic is every time we 'reset' $self->{block} we first copy
+   # that over to $self->{priorblock} so that we have those 4 bytes
+   # which do not properly get cleaned out.
+   my $len = $self->{lenll}/8;
+   if ($len > 116) { # NOTE, only work up to 4gbit string (good enough for JtR)
+       #print "lenll      $len\n";
+       #print "block      ".unpack ("H*", $self->{block})."\n";
+       #print "priorblock ".unpack ("H*", $self->{priorblock})."\n";
+       if ($len < 120) {
+           substr($self->{block}, 116, $len-116) = substr($self->{priorblock}, 116, $len-116);
+           substr($self->{block}, $len, 1) = chr(0x80);
+       } else {
+           substr($self->{block}, 116, 4) = substr($self->{priorblock}, 116, 4);
+       }
+   }
    &{$self->{sha}}($self, $self->{block});
 }
jfoug commented 8 years ago

Wouldn't length sorting be pretty much needed for OpenMP also? Otherwise we'll have threads waiting with same effect as SIMD/OpenCL.

#!/usr/bin/perl

    for ($i = 0; $i < 50000; $i++) {
for ($j = 4; $j < 12; ++$j) {
        print chr($i%26+ord('a'));
        for ($k = 0; $k < $j; $k++) {
            print chr(ord('A')+$k);
        }
        print "\n";
   }
}

Ok, I made 2 word files (switched the 2 outer for loops. One builds file like this:

aABCD
aABCDE
aABCDEF
aABCDEFG
aABCDEFGH
aABCDEFGHI
aABCDEFGHIJ
aABCDEFGHIJK
bABCD
bABCDE
bABCDEF
bABCDEFG
bABCDEFGH
bABCDEFGHI
bABCDEFGHIJ
bABCDEFGHIJK
cABCD
cABCDE
cABCDEF
cABCDEFG
cABCDEFGH
cABCDEFGHI
cABCDEFGHIJ
cABCDEFGHIJK
dABCD
dABCDE
dABCDEF
...

The other method builds the same words in the file, but sorted by length (all shorter ones before the next length)

aABCD
bABCD
cABCD
dABCD
eABCD
fABCD
gABCD
hABCD
iABCD
jABCD

Here are runs against the same input file, using the 2 different word lists. NOTE, the 2nd run 'appears' to be faster at the start, but that is only due to the input passwords being shorter. When the longer words are started, that 2nd run slows down. so that in the end, the speed is 'almost' the same.

$ ../../run/john -w=unsort in
Using default input encoding: UTF-8
Loaded 20 password hashes with 20 different salts (qnx, qnx hash [QNX 32/64 generic])
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:11 14.75% (ETA: 11:15:47) 0g/s 5031p/s 100632c/s 100632C/s kABCD..rABCDEFGHIJK
0g 0:00:01:00 75.34% (ETA: 11:15:52) 0g/s 4957p/s 99154c/s 99154C/s qABCD..xABCDEFGHIJK
0g 0:00:01:21 DONE (2016-01-06 11:15) 0g/s 4932p/s 98654c/s 98654C/s uABCD..bABCDEFGHIJK
Session completed

$ ../../run/john -w=sort in
Using default input encoding: UTF-8
Loaded 20 password hashes with 20 different salts (qnx, qnx hash [QNX 32/64 generic])
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:28 38.66% (ETA: 11:17:11) 0g/s 6885p/s 137742c/s 137742C/s sABCDEFG..dABCDEFG
0g 0:00:01:17 97.64% (ETA: 11:17:17) 0g/s 5087p/s 101765c/s 101765C/s uABCDEFGHIJK..fABCDEFGHIJK
0g 0:00:01:19 DONE (2016-01-06 11:17) 0g/s 5052p/s 101054c/s 101054C/s qABCDEFGHIJK..bABCDEFGHIJK
Session completed

So you can see, that there was only a 'tiny' bit of difference between sorted and unsorted password lengths. I do not see this is going to be an issue. Now for SIMD (or openCL), it IS a problem, because you want all strings to grow at the same speed, so that when you have to hash a block, it is at the same time for all candidates.

jfoug commented 8 years ago

Ok, I changed my script a bit (just to make sure), so that there were only 5 lengths (not an even multiple of the OMP 8), and the gaps were larger (10 bytes). Here is the information (again, the runs are almost same speed). Yes, there is a TINY improvement for the sorted words, BUT I think the overhead I would add doing the sort and then undoing it (since you have to keep things in proper index order), would far outweigh ANY gains.

$ ../../run/john -w=sort in
Loaded 20 password hashes with 20 different salts (qnx, qnx hash [QNX 32/64 generic])
Will run 8 OpenMP threads
0g 0:00:00:48 DONE (2016-01-06 11:29) 0g/s 2071p/s 41431c/s 41431C/s aAABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUU..fAABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUU

$ ../../run/john -w=unsort in
Loaded 20 password hashes with 20 different salts (qnx, qnx hash [QNX 32/64 generic])
Will run 8 OpenMP threads
0g 0:00:00:50 DONE (2016-01-06 11:30) 0g/s 1977p/s 39549c/s 39549C/s zAABBCCDDEEFFGGHHIIJJKKLLMMNNOOPP..fAABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUU

$ head unsort
aAA
aAABBCCDDEEFF
aAABBCCDDEEFFGGHHIIJJKK
aAABBCCDDEEFFGGHHIIJJKKLLMMNNOOPP
aAABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUU
bAA
bAABBCCDDEEFF
bAABBCCDDEEFFGGHHIIJJKK
bAABBCCDDEEFFGGHHIIJJKKLLMMNNOOPP
bAABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUU

$ head sort
aAA
bAA
cAA
dAA
eAA
fAA
gAA
hAA
iAA
jAA
magnumripper commented 8 years ago

Very interesting. It may depend on who's OpenMP you run but until we actually see one that has problems unsorted, I agree we should not sort for now unless SIMD.

jfoug commented 8 years ago

NOTE, I have issues with SHA512. It happens when the last buffer has to be added as a 'clearing' buffer, and I think it only happens if that last buffer is between 114 and 118 bytes long (i.e. within the area which gets the dirty data). I have 2 methods, one is on pass_gen.pl and one that is in sha2.c

I have sent some sample hashes, and salts off to find out what 'ACTUAL' QNX gives for the final hash. I will then fix either pass_gen.pl or sha2.c. Actually, I have both of them able to create both 'types' hashes, depending upon the information returned from the real QNX process.

That is why I have waited to get QNX-sha512 added to jtrts.pl data yet. I was getting 1380 and am not sure if pass_gen.pl or if my sha2.c are not exactly like QNX (but hopefully ONE of them is, lol)

jfoug commented 8 years ago

I received my sample hash back from kaan, and was able to 'match' it. I think I have the pattern down, when the length mod 128 in the range of 109 to 116.

I created a set of hashes of length 222 to 232 which should cover that entire 'range' of problematic values. I have sent these off to Kaan, and am waiting to find out if my code assumptions are right. If so, then I have both pass_gen.pl and JtR working properly with the buggy QNX_sha512 logic. Right now, it works partly. jtrts found that it was not working fully, however. pass_gen.pl output, and jtr were not the same, AND both were wrong, lol.

jfoug commented 8 years ago

NOTE, I started down the path of MD5 simd code, and it was slower in the end than the oSSL code. Most of this is due to buffer and intermixed handling. Also, the code is not 100% working. If I mix lengths, it is still failing. This is NOT an easy format, and it may simply be that SIMD is not a net gain, since there is SOOO much multiple string appending overhead.

magnumripper commented 8 years ago

Do we know whether anyone reported this bug to QNX (Blackberry?) yet?

jfoug commented 8 years ago

I have no idea.

jfoug commented 8 years ago

Once I get the code 100% working, I can provide either my sha2.c code, or my patched pp_sha.pm code

But first I need to make sure my changes are right. At this time, I think they are, as I can see how the code was likely written, and 'how' the bug was allowed to be. To me, it appears that all work is done inside a single window buffer. Then when they add the bits, the lower bits are added properly with 8 byte int, BUT the top bits are improperly added as a 4 byte int.

I also am not 100% sure what to expect on a > 4gbit crypt string. Without seeing the code (or doing some test to check it out), I really have no idea what to expect there.

For JtR usage, where we limit to 48 byte plaintext, 4gbit is more than plenty of room to work with.

jfoug commented 8 years ago

These were all confirmed when checked against QNX. This was one before the problem area to 1 after the problem area for 'barely' 3 SHA512 limb hashes. So once I know the logic, it works the same for all lengths (up to 4gbit and we don't care about that much data anyway).

u0:@S,222@510e635f301f48bc7755a180dd015e09b7d42b66b85868d7d22e3a4368179807cea981f5b51730c68f646f0e5c96001ef39a89b7419b0e75d34b1e74ea07cf06@abcd1234abcd1234:0:0:3:
u0:@S,223@bc01a21efdd35a499f96306110f65c8e84e5914de86974013a6d0d6633505f73e112850ce89db6286922b4f77f1b658be6f043603c9853bebd838d34bd201192@abcd1234abcd1234:0:0:3:
u0:@S,224@7b016f094521188ddf3779c80627490a28276d66f687f225cfb679982fbfed9d9548fcad68d87ded20c828a5af821f9fd55d3506a873c09842d36da31f196f43@abcd1234abcd1234:0:0:3:
u0:@S,225@cb6c1b6f166f05ec385450a4fbb1b4906ffe47a899f6c8fd8e30a92b625c89dd605d693b9af17fd3cfd71adaf3edb6753afc94eaea865bbcb934dd0ca725472f@abcd1234abcd1234:0:0:3:
u0:@S,226@d3b5e9008ceb0f416692fa69d6ac69c51a471d8c6c4ef92aa3859b0ffdbf19f0faac08283ec0273b22971136ac5a39f96b0b9e4de95b5fe99ac2a60bf0828507@abcd1234abcd1234:0:0:3:
u0:@S,227@7e11a2ee27ef1e217fc7644cad7b93ca51a2d6bd92585348afc7b665ca5f066c12b9d5ec75192744f0f7aa89f9fed958ce8d3265f530f71f46333e0a244a7e15@abcd1234abcd1234:0:0:3:
u0:@S,228@8b824c10866b6bf9fafc20a4659ac1f45acb9503a086743f9dcc14260ac6d270ef40b858cfc88a3eaa8e8ad6d4d05eca3c8f8ad8e1979be0d3b7b2f9c9ac8413@abcd1234abcd1234:0:0:3:
u0:@S,229@cba10155ae05948cc69e30f2c42f302543fed7385fce6af16ad37bb73560ab95e0c48d754538ad404f0849eabda427d3b1417de113964658fb8c88fad721fee0@abcd1234abcd1234:0:0:3:
u0:@S,230@8b9c619e1a5de811c470b287aa06655f5283abfceb447d26e688dfd1f6870aa9669b98b86e2e109390b75661970ae4582f190289e0fb48b64dec5f5bb3714168@abcd1234abcd1234:0:0:3:
u0:@S,231@c651174fab35480c0f5c62c8d481bf3273ff81253bc2ed014405f59f4ece32d3b6afdff0cc4283fe7bf69ff41cde92f6a2fb9c64ecee6cb0fb8b3d3a5327d34a@abcd1234abcd1234:0:0:3:
u0:@S,232@4e77904aac5a341ac947087d492336a7e20b8d355e52c7a377c460e560b59e6fecc4bb114904412f868bed6683f6a3d7787b9edf331798b479483c9588b50229@abcd1234abcd1234:0:0:3:

I will get the changes to pass_gen.pl and sha2.c done and checked in today, and get the TS with the proper file for this hash.

jfoug commented 8 years ago

This corrects my code, and now we should match for all lengths. I just had to figure out 'how' they were handling their buffer.

e84a3ee

kholia commented 7 years ago

This is not a widely used hashing scheme. Consequently, it is not important to have SIMD and OpenCL support for it. So I am closing this ticket.