very bad (and decreasing) performance on mode "Single" in certain circumstances

p4tpr0 commented 7 years ago

I have a hash file where every line looks like this:

candidate:$dynamic_25$hash$salt

"candidate" is a very serious password candidate to test, with 50-60% success rate without even using mangling rules. When using mode "single" on this big hash file with "SingleRetestGuessed = N" and no rules, I see a bad performance that decrease along the cracking session. Sample session:

$ time ./john --single=None --nolog --verbosity=1 pw-1M-real  --pot=pw-1M-real.pot
Using default input encoding: UTF-8
Loaded 1000000 password hashes with 1000000 different salts (dynamic_25 [sha1($s.$p) 128/128 AVX 4x1])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:02 50.00% (ETA: 16:24:20) 0g/s 0p/s 0c/s 0C/s
30769g 0:00:00:08 50.00% (ETA: 16:24:33) 3445g/s 5563p/s 5563c/s 5563C/s aoygm
48159g 0:00:00:18 50.00% (ETA: 16:24:53) 2544g/s 4108p/s 4108c/s 4108C/s agmys
61712g 0:00:00:29 50.00% (ETA: 16:25:15) 2062g/s 3341p/s 3341c/s 3341C/s aprzjx
82423g 0:00:00:52 50.00% (ETA: 16:26:01) 1557g/s 2523p/s 2523c/s 2523C/s cxox
93556g 0:00:01:08 50.00% (ETA: 16:26:33) 1357g/s 2201p/s 2201c/s 2201C/s yvoq
105281g 0:00:01:29 50.00% (ETA: 16:27:15) 1170g/s 1897p/s 1897c/s 1897C/s pfkwx
117217g 0:00:01:54 50.00% (ETA: 16:28:04) 1028g/s 1665p/s 1665c/s 1665C/s vpqu
132177g 0:00:02:29 50.00% (ETA: 16:29:15) 881.5g/s 1428p/s 1428c/s 1428C/s acwzhx
145131g 0:00:03:06 50.00% (ETA: 16:30:29) 776.4g/s 1257p/s 1257c/s 1257C/s airzx
193370g 0:00:06:35 50.00% (ETA: 16:37:27) 488.4g/s 791.9p/s 791.9c/s 791.9C/s amoxr
...

The performance is good if none of the candidates matches its hash, or if each candidate matches its hash. The performance is very bad if the file is a mix of both cases.

How to reproduce:

create a 1M hash file with this command:

perl -e 'use Digest::SHA qw(sha1_hex); $p="a"; for ($i=0;$i<1000000;$i++) {$s=substr(sha1_hex($p), 16); $h=sha1_hex($s.$p); print "$p:\$dynamic_25\$$h\$$s\n"; $p++;}' > pw-1M

create a more realistic file, where ~38% candidates do not match their hash:
```
sed 's,$[iwnfjzahlc]$:,\1x:,' pw-1M >pw-1M-real
```
use john on each file, with "SingleRetestGuessed = N" in john.conf, and following run time options:
```
./john --single=None --nolog --verbosity=1 pw-1M  --pot=blank-pot
./john --single=None --nolog --verbosity=1 pw-1M-real  --pot=other-blank-pot
```
Memory consumption is quite high also, but performance is my main concern here.

John build-info:

Version: 1.8.0.9-jumbo-1-bleeding Build: freebsd10.3 64-bit AVX-ac SIMD: AVX, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1 $JOHN is ./ Format interface version: 14 Max. number of reported tunable costs: 3 Rec file version: REC4 Charset file version: CHR3 CHARSET_MIN: 1 (0x01) CHARSET_MAX: 255 (0xff) CHARSET_LENGTH: 24 SALT_HASH_SIZE: 1048576 Max. Markov mode level: 400 Max. Markov mode password length: 30 gcc version: 5.4.0 Crypto library: OpenSSL OpenSSL library version: 020000000 (loaded: 01000113f) LibreSSL 2.4.4 (loaded: OpenSSL 1.0.1s-freebsd 1 Mar 2016) GMP library version: 5.1.3 File locking: fcntl() fseek(): fseek ftell(): ftell fopen(): fopen memmem(): System's

Full discussion on the mailing-list: http://www.openwall.com/lists/john-users/2017/02/02/1

jfoug commented 7 years ago

Note, this bug has nothing to do with the min-cracks == 64 in dynamic. But I am going to see if that can be reduced to 1, since that DOES impact memory greatly, but I think that reduction will also greatly impact single throughput, when normal single rules are used. Dyna has a LOT of overhead. Reducing it to 1 (or SIMD_COEF) will make other runs of single run at probably 10-20% of what they should.

I have done a bit of single stepping, and I have not found where this problem is located at. I will look more when I have more time.

magnumripper commented 6 years ago

The min. keys per crypt should always be (SIMD_COEF SIMD_PARA THREADS) as far as I know. Bumping it to 64 doesn't make any sense to me.

magnumripper commented 5 years ago

I believe recent changes to single mode should have fixed this issue (eg. by automagically limiting max. plaintext length down to 16 if needed), and/or added some warning for what's the cause (i.e. memory constraints).

magnumripper commented 5 years ago

Now I'm instead pretty sure this is another symthom of #3417...

jfoug commented 5 years ago

@p4tpr0 Please validate after these fixes: #3556 and #3547

Once validated on your end, please close this bug.

p4tpr0 commented 5 years ago

Well, I guess I can conclude it's now worst than what it was when I've opened this issue:

code from 201702:

./john --single=None --nolog --verbosity=1 pw-1M-real  --pot=pw-1M-real.pot
Using default input encoding: UTF-8
Loaded 1000000 password hashes with 1000000 different salts (dynamic_25 [sha1($s.$p) 64/64])
Press 'q' or Ctrl-C to abort, almost any other key for status
54930g 0:00:00:22 50.00% (ETA: 22:31:58) 2412g/s 3901p/s 3901c/s 3901C/s aqbzr
73766g 0:00:00:40 50.00% (ETA: 22:32:34) 1808g/s 2930p/s 2930c/s 2930C/s adksr
102417g 0:00:01:20 50.00% (ETA: 22:33:54) 1267g/s 2055p/s 2055c/s 2055C/s dmhcx
134843g 0:00:02:27 50.00% (ETA: 22:36:08) 912.4g/s 1478p/s 1478c/s 1478C/s avmce
144737g 0:00:02:54 50.00% (ETA: 22:37:02) 828.1g/s 1340p/s 1340c/s 1340C/s aprzv
...

code as of today:

./john --single=None --nolog --verbosity=1 pw-1M-real  --pot=pw-1M-real.pot
Using default input encoding: UTF-8
Loaded 1000000 password hashes with 1000000 different salts (dynamic_25 [sha1($s.$p) 64/64])
Press 'q' or Ctrl-C to abort, almost any other key for status
13601g 0:00:00:30 50.00% (ETA: 22:41:32) 452.5g/s 274.4p/s 1843Kc/s 1843KC/s agqfm..bbwmr
26159g 0:00:01:55 50.00% (ETA: 22:44:22) 227.3g/s 139.2p/s 1807Kc/s 1807KC/s aeaqu..ahuig
31176g 0:00:02:47 50.00% (ETA: 22:46:06) 186.6g/s 114.7p/s 1772Kc/s 1772KC/s bcgsv..alhgb
33913g 0:00:03:20 50.00% (ETA: 22:47:12) 169.5g/s 104.6p/s 1754Kc/s 1754KC/s atmtg..uvze
...

both compiled on FreeBSD 11.2-RELEASE:

./configure CC=gcc7 LDFLAGS="-L/usr/lib -L/usr/local/lib/gcc7" --disable-native-tests CFLAGS='-O2 -mno-sse2 -mno-mmx -U__SSE__ -I/usr/include/openssl' --disable-pkg-config --disable-openmp
gmake clean && gmake -s

At least on FreeBSD I can load my whole hash file (1M lines). Current code compiled on Ubuntu 16 LTS just wont load a file that large unless I provide 123 GB RAM:

./john --single=None --nolog --verbosity=1 pw-1M  --pot=blank-pot
Using default input encoding: UTF-8
Loaded 1000000 password hashes with 1000000 different salts (dynamic_25 [sha1($s.$p) 256/256 AVX2 8x1])
Note: Max. length decreased from 110 to 16 due to single mode buffer size
      limit of 4 GB. Use --max-length=N option to override, or increase
      SingleMaxBufferSize in john.conf (123 GB needed).
Note: Can't run single mode with this many salts due to single mode buffer
      size limit of 4 GB (48 keys per batch would use 123 GB, decreased to
      12 for 122 GB). To work around this, increase SingleMaxBufferSize in
      john.conf (if you have enough RAM) or load fewer salts at a time.

I've managed to test with a 100K lines file + SingleMaxBufferSize set to 16GB, same performance decrease over time:

Loaded 100000 password hashes with 100000 different salts (dynamic_25 [sha1($s.$p) 256/256 AVX2 8x1])
Press 'q' or Ctrl-C to abort, almost any other key for status
8940g 0:00:00:20 50.00% (ETA: 22:14:50) 446.5g/s 719.6p/s 1217Kc/s 1217KC/s dacs..ephx
12557g 0:00:00:31 50.00% (ETA: 22:15:12) 404.8g/s 652.1p/s 1561Kc/s 1561KC/s buxk..abxr
19576g 0:00:01:04 50.00% (ETA: 22:16:18) 305.7g/s 493.7p/s 1842Kc/s 1842KC/s dngp..asbg
29031g 0:00:02:18 50.00% (ETA: 22:18:46) 210.3g/s 341.0p/s 1887Kc/s 1887KC/s burq..dxmd
36685g 0:00:04:00 50.00% (ETA: 22:22:10) 152.7g/s 248.5p/s 1742Kc/s 1742KC/s apxv..enbt
59036g 0:00:13:52 50.00% (ETA: 22:41:54) 70.95g/s 115.2p/s 1309Kc/s 1309KC/s ayrx..kps
61538g 0:00:21:33 50.00% (ETA: 22:57:17) 47.57g/s 77.30p/s 1376Kc/s 1376KC/s enru..acvs
...

magnumripper commented 5 years ago

I've managed to test with a 100K lines file + SingleMaxBufferSize set to 16GB, same performance decrease over time

How much real RAM does that FreeBSD gear have? And how much on the Ubuntu system? Are you sure they didn't start swapping? That would explain the whole problem. The warnings are new but the problem was always there: Single mode needs memory for a full batch of full-length plaintexts plus indeces and hash tables per salt, and a million salts are a lot.

p4tpr0 commented 5 years ago

No swap: both machines have 32 GB RAM. FreeBSD box is used for many other things. It has about 5GB Free RAM, and process for current code takes less than 2GB RAM for 1M hashes:

Mem: 3069M Active, 11G Inact, 531M Laundry, 15G Wired, 2201M Free
ARC: 7303M Total, 2414M MFU, 3419M MRU, 43M Anon, 248M Header, 1179M Other
     4753M Compressed, 7951M Uncompressed, 1.67:1 Ratio
Swap: 6144M Total, 6144M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
20577 patpro        1 122   19  1835M  1808M CPU3    3   1:09  98.29% john

On Ubuntu, most memory is free, and john has to be tuned to accept a 100K hashes file (SingleMaxBufferSize to 16GB). It consumes a lot more memory but won't swap:

KiB Mem : 32864196 total, 13146816 free, 14868764 used,  4848616 buff/cache
KiB Swap: 33476604 total, 33429028 free,    47576 used. 17460224 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                           
 8985 patpro    39  19 12.909g 0.013t   8180 R 100.0 41.0   0:53.19 john

magnumripper commented 5 years ago

OK, I see now that FBSD machine doesn't use SIMD while the Ubuntu has AVX2 so 8x larger batch size (and that does sum up the 2 GB vs. 16 GB).

The problem is likely related to your disabling of rules. What if you try normal single rules, is performance figures better then? Are you not getting warnings about low number of candidates?

p4tpr0 commented 5 years ago

On FreeBSD, code from 20190109:

$ ./john --single --nolog --verbosity=1 pw-1M-real  --pot=blank-pot-2
Using default input encoding: UTF-8
Loaded 1000000 password hashes with 1000000 different salts (dynamic_25 [sha1($s.$p) 64/64])
Press 'q' or Ctrl-C to abort, almost any other key for status
13050g 0:00:00:14 1.50% (ETA: 14:41:29) 929.5g/s 416.6p/s 1362Kc/s 1362KC/s nxcy..asjjc
27838g 0:00:00:43 1.50% (ETA: 15:13:37) 646.6g/s 288.0p/s 2007Kc/s 2007KC/s adzyx..kbjm
32951g 0:00:00:59 1.50% (ETA: 15:31:20) 558.0g/s 247.3p/s 2049Kc/s 2049KC/s iucb..gfsr
46764g 0:00:02:00 1.50% (ETA: 16:38:55) 389.5g/s 174.1p/s 2033Kc/s 2033KC/s avoif..akagn
64131g 0:00:03:53 1.50% (ETA: 18:44:06) 275.2g/s 122.1p/s 1967Kc/s 1967KC/s bcgme..eojd
76109g 0:00:05:37 1.50% (ETA: 20:39:19) 225.8g/s 100.8p/s 1914Kc/s 1914KC/s zmg..rcvm
85673g 0:00:07:15 1.50% (ETA: 22:27:53) 196.9g/s 88.10p/s 1881Kc/s 1881KC/s agadx..azead
96024g 0:00:09:17 1.50% (ETA: 00:43:03) 172.3g/s 77.15p/s 1847Kc/s 1847KC/s bcbyl..bcara

no warning and same decreasing performance.

magnumripper commented 5 years ago

Well the decrease is now moderate - from 2049K at best, to 1847K. That's no big deal, is it? Some rules process very fast, some fairly slow - this is completely normal. Your initial post showed a steady decrease from 5563 to 791, that's more alarming. Not to mention your last post is 500 times faster than that (note the K vs. no K)!

I really don't get why the p/s is not in sync with c/s in your last post though. @solardiz any clue?

solardiz commented 5 years ago

I guess buffers for more salts fill up and are actually tested against those salts, which takes time and results in testing of more {candidate password, salt} combinations (thus, steady c/s and C/s), but does't result in more candidate passwords being introduced for testing against any salts (thus, decreasing p/s). In other words, initial p/s may have been "inflated" by testing those new candidate passwords against smaller subsets of salts only, which ceases to be the case the longer we run with SingleRetestGuessed set (I don't know if it was set or not in this run - but I guess that it was?) At ~900k remaining salts and going at ~1.8M c/s, the target p/s is around 2, so even the ~80 p/s we see at the 9 minute mark is still heavily "inflated".

solardiz commented 5 years ago

On a related note, the hashing speeds of ~2M c/s seem very low for this format (even on one core, which I think it is). I'm surprised this was the case even in the AVX2 build. Where are we wasting so much CPU time?

Also, the "64/64" reporting is almost certainly wrong for a non-SIMD build here, because SHA-1 can't use more than 32 bits in that case. Should be "32/64". This is probably something for @jfoug to fix.

solardiz commented 5 years ago

When running with SingleRetestGuessed we skip counting the first salt->keys->count_from_guesses candidates per salt towards status.cands in crk_process_salt(). This is correct since this many guesses are presumed to have already been tried against another salt (and success there is why we're retrying them against other salts now), thus they are not new candidates.

In other words, single mode tries to maximize the guess rate obtained early on, because we want to crack more passwords sooner rather than later. That's a feature, not a bug.

jfoug commented 5 years ago

There may be a bug in the recursion code within single.c. Here is an interesting patch to monitor what is happening on the recursion level. For me, in my Fedora VM, I recurse ~65000 times, then it crashes. It is probably where the stack exceeds 1mb or something. My test data is RockYou data (250k lines, fed through pass_gen.pl to create the data). I am currently building data in dynamic_25 format, but I can not see how dynamic would play any part in this problem (other than it is probably a very heavy stack user itself).

$ diff -c single.c ../..
*** single.c    2019-01-12 07:55:17.072875300 -0600
--- ../../single.c      2019-01-13 07:36:59.356022800 -0600
***************
*** 526,539 ****
        return 0;
  }

  static int single_process_buffer(struct db_salt *salt)
  {
        struct db_salt *current;
        struct db_keys *keys;
        size_t size;

!       if (crk_process_salt(salt))
                return 1;

  /*
   * Flush the keys list (since we've just processed the keys), but not the hash
--- 526,550 ----
        return 0;
  }

+ static int recur_level=0;
+ static int last_recur_level = 0;
+
  static int single_process_buffer(struct db_salt *salt)
  {
        struct db_salt *current;
        struct db_keys *keys;
        size_t size;

!         ++recur_level;
!         if (recur_level > 2000 && recur_level % 100 == 0 && recur_level != last_recur_level) {
!               last_recur_level = recur_level;
!               printf ("recur=%d\n", recur_level);
!       }
!
!       if (crk_process_salt(salt)) {
!                 --recur_level;
                return 1;
+       }

  /*
   * Flush the keys list (since we've just processed the keys), but not the hash
***************
*** 565,572 ****
                                if (current == salt || !current->list)
                                        continue;

!                               if (single_add_key(current, keys->ptr, 1))
                                        return 1;
                        } while ((current = current->next));
                        keys->ptr += length;
                } while (--keys->count);
--- 576,587 ----
                                if (current == salt || !current->list)
                                        continue;

!                               // recursion happens here!
!                               if (single_add_key(current, keys->ptr, 1)) {
!                                       --recur_level;
!                                       // how do we NOT free keys here?!!!
                                        return 1;
+                               }
                        } while ((current = current->next));
                        keys->ptr += length;
                } while (--keys->count);
***************
*** 580,586 ****
                keys->rule[0] = rule_number;
                keys->rule[1] = rules_stacked_number;
        }
!
        return 0;
  }

--- 595,601 ----
                keys->rule[0] = rule_number;
                keys->rule[1] = rules_stacked_number;
        }
!       --recur_level;
        return 0;
  }

Ok, when run, the recur_level simply increases, on this data, never decreases. It 'follows' the crack count. To me, this simply seems like an infinite recursion bug.

$ ../run/john 250k-dyna25.in -single --pot=0.pot -nolog -verb=1 --max-length=16
Warning: invalid UTF-8 seen reading 250k-dyna25.in
Using default input encoding: UTF-8
Loaded 249997 password hashes with 249997 different salts (dynamic_25 [sha1($s.$p) 256/256 AVX2 8x1])
Press 'q' or Ctrl-C to abort, almost any other key for status
recur=2100
recur=2200
recur=2300
recur=2400
recur=2500
recur=2600
recur=2700
recur=2800
recur=2900
recur=3000
recur=3100
recur=3200
recur=3300
recur=3400
recur=3500
recur=3600
recur=3700
recur=3800
recur=3900
recur=4000
recur=4100
recur=4200
recur=4300
recur=4400
recur=4500
recur=4600
4627g 0:00:00:24 1.50% (ETA: 08:02:13) 192.7g/s 11238p/s 53005c/s 53005C/s p0pcorn..1sexse
recur=4700
recur=4800
recur=4900
recur=5000
recur=5100
recur=5200
recur=5300
recur=5400
recur=5500
recur=5600
recur=5700
recur=5800
recur=5900
recur=6000
recur=6100
recur=6200
recur=6300
recur=6400
recur=6500
recur=6600
recur=6700
recur=6800
recur=6900
recur=7000
7030g 0:00:00:28 1.50% (ETA: 08:06:39) 251.0g/s 12312p/s 92067c/s 92067C/s HOKU69..ynvthis1
recur=7100
recur=7200
recur=7300
recur=7400
recur=7500
recur=7600
recur=7700
recur=7800
recur=7900
recur=8000
recur=8100
recur=8200
recur=8300
recur=8400
recur=8500
recur=8600
recur=8700
recur=8800
recur=8900
recur=9000
recur=9100
recur=9200
recur=9300
recur=9400
recur=9500
recur=9600
recur=9700
recur=9800
recur=9900
9976g 0:00:00:35 1.50% (ETA: 08:14:24) 285.0g/s 12475p/s 139497c/s 139497C/s notyours66..CHACHA1
recur=10000
recur=10100
recur=10200
recur=10300
recur=10400
recur=10500
recur=10600
recur=10700
recur=10800
recur=10900
recur=11000
recur=11100
recur=11200
recur=11300
recur=11400
recur=11500
recur=11600
recur=11700
recur=11800
recur=11900
recur=12000
recur=12100
recur=12200
recur=12300
recur=12400
recur=12500
recur=12600
recur=12700
recur=12800
recur=12900
recur=13000
recur=13100
recur=13200
recur=13300
recur=13400
13411g 0:00:00:45 1.50% (ETA: 08:25:29) 298.0g/s 12073p/s 187719c/s 187719C/s hiphop15..hungup15
recur=13500
recur=13600
recur=13700
recur=13800
recur=13900
... clip
recur=44700
recur=44800
44886g 0:00:05:15 1.50% (ETA: 13:24:36) 142.4g/s 4836p/s 273551c/s 273551C/s bread..pentruck
recur=44900
recur=45000
recur=45100
... clip
recur=64000
recur=64100
recur=64200
recur=64300
recur=64400
recur=64500
64577g 0:00:10:31 1.50% (ETA: 19:14:40) 102.3g/s 3384p/s 280359c/s 280359C/s redsox2005..famous04
recur=64600
recur=64700
recur=64800
recur=64900
recur=65000
recur=65100
recur=65200
recur=65300
AddressSanitizer:DEADLYSIGNAL
=================================================================
==12370==ERROR: AddressSanitizer: stack-overflow on address 0x7ffef07dfff8 (pc 0x7f48ef442fa4 bp 0x7ffef07e0082 sp 0x7ffef07e0000 T0)
    #0 0x7f48ef442fa3  (/lib64/libasan.so.5+0x3dfa3)
    #1 0x7f48ef4534de  (/lib64/libasan.so.5+0x4e4de)
    #2 0x7f48ef454d1e in vsprintf (/lib64/libasan.so.5+0x4fd1e)
    #3 0x7f48ef454fde in sprintf (/lib64/libasan.so.5+0x4ffde)
    #4 0x8b90d7 in log_guess /home/ghost/JtR/bleed/src/logger.c:557
    #5 0x88b5e4 in crk_process_guess /home/ghost/JtR/bleed/src/cracker.c:442
    #6 0x88d182 in crk_password_loop /home/ghost/JtR/bleed/src/cracker.c:871
    #7 0x88fcf6 in crk_process_salt /home/ghost/JtR/bleed/src/cracker.c:1180
    #8 0x8e6340 in single_process_buffer /home/ghost/JtR/bleed/src/single.c:544
    #9 0x8e6ff0 in single_add_key /home/ghost/JtR/bleed/src/single.c:524
    #10 0x8e668f in single_process_buffer /home/ghost/JtR/bleed/src/single.c:580
    #11 0x8e6ff0 in single_add_key /home/ghost/JtR/bleed/src/single.c:524
    #12 0x8e668f in single_process_buffer /home/ghost/JtR/bleed/src/single.c:580
    #13 0x8e6ff0 in single_add_key /home/ghost/JtR/bleed/src/single.c:524
    #14 0x8e668f in single_process_buffer /home/ghost/JtR/bleed/src/single.c:580
...  dump up to 250 stack, all the rest are single   524 -> 580 ->524 .... recursion

magnumripper commented 5 years ago

Not sure why you used that strange diff format, I don't understand it. Anyway I did similar things some time ago and reach 5-6000 levels then it decreased again. I'm certain this isn't quite a bug, except perhaps there should be some safeguard against going too deep (@solardiz heads-up).

The "trigger" here is the unrealistic test data, it wouldn't happen IRL. What actually happens is that all buffers fill equally and then when the first one is full, crypt_all() gets called and we get a crack. Now that successful candidate is added to all other salts' list of words... however, as soon as it was added to the first of them, THAT buffer gets full, crypt_all is called, and we get yet another candidate that will be added to all salts. And when that is added to salt numero tres, THAT buffers gets full and so on. For 250K salts we will recurse until we smash the stack.

It will obviously cease to happen at all if you set SingleRestestGuessed = N

p4tpr0 commented 5 years ago

The "trigger" here is the unrealistic test data, it wouldn't happen IRL

This bug was triggered by real life data :) Full context: https://www.patpro.net/blog/index.php/2017/04/09/3071-my-take-on-the-myspace-dump/

jfoug commented 5 years ago

Ok, performed same test using mhac-sha1 (after fixing the format, which fails to have a salt-hash function). It crashed exactly the same manner, at about 65000 recursion depth, stack overflowed.

jfoug commented 5 years ago

The "trigger" here is the unrealistic test data,

Partly agree. This 'test' data is unrealistic, as in every user name is a password. However, as @p4tpr0 shows, it does not keep this issue under control.

It will obviously cease to happen at all if you set SingleRestestGuessed = N

it does.

I have always strongly disliked the re-test guessed aspect, of single mode. It does nothing but significantly slow things down. Yes, it may increase crack rate a 'tiny' bit, BUT it increases crack time, often by orders of magnitude.

Now, for 'core' john, that may be the best that can be done. It does not have ability to perform things like -loopback= which performs much better. But for jumbo, having single mode re-insert cracks back into the single stream is not a good thing (IMHO).

That being said, yes, we need to figure out SOME way to protect ourselves, making sure that recursion DOES complete, and also DOES NOT go too deep.

NOTE, there is also a memory leak, right after the recursive call. The keys data value MUST be MEM_FREE called. I have tested this (ASAN build), and it is 100% a memory leak.

jfoug commented 5 years ago

This is the first time I have read https://www.patpro.net/blog/index.php/2017/04/09/3071-my-take-on-the-myspace-dump/ (btw, nice writeup @p4tpr0, heh).

The data seen there is 'very' similar to the worst case senario (which is the input file I created with pass_gen.pl which lists the exact password in gecos). So @magnumripper, you can not flippantly deem the trigger data as 'unrealistic'.

Now, if @p4tpr0 had simply removed the comment on the the #SingleRestestGuessed = N line in john.conf, the entire file would have been testable in 1 run, and probably would have succeeded much faster than the 87 minutes he lists in the write up.

So I still state, that this is NOT unrealistic data. What I find to be most 'unrealistic' is the default behavior of re-injecting cracks back into the single wordspace. What is happening here, is we have taken the wonderful tool of single (which ONLY deals with a tiny subset of data, SPECIFIC to a hash), and now treats it in wordlist manner, splashing words across the board, blindly to all hashes, which really is against the concept of single.

solardiz commented 5 years ago

I liked the SingleRestestGuessed behavior very much when I worked on this code in 1990s - attacking not too many salts, and having fairly infrequent guesses. I would know those have already been tested against all salts, saving me time on manual "re-cracking". But I agree it's often unsuitable these days, and I don't mind changing the default in jumbo. Since it will differ from core, maybe we need to print a "Note: ..." each time single mode is used, listing the current setting and a brief explanation there. Maybe that "Note: ..." should be printed the first time single mode cracks a password (so it's not printed when there are no single mode cracks and thus the setting doesn't matter).

We could also introduce a hard-coded limit on the recursion (e.g., 10000?), and print a "Warning: ..." when it's reached saying that some guesses won't be re-tested against all salts because of that limit. Maybe even switch to the SingleRestestGuessed = N behavior in that case since the user wouldn't be able to rely on that feature anyway.

There's no memory leak (at least not in core) - we MEM_FREE entire key buffers (not individual keys) as the recursion unwinds.

jfoug commented 5 years ago

There's no memory leak (at least not in core) - we MEM_FREE entire key buffers (not individual keys) as the recursion unwinds.

here is the code (with a comment added about the memory leak), from current jumbo. This IS a memory leak. I have validated it by running a small file generated by pass_gen, so that it completed fully. Leaks were listed. I inserted the mem_free, leaks gone (but still found exactly same resultant .pot file). ASAN is a fantastic tool,

static int single_process_buffer(struct db_salt *salt)
{
//  .....
    if (retest_guessed)
    if (guessed_keys->count) {
        keys = mem_alloc(size = sizeof(struct db_keys) - 1 +
            length * guessed_keys->count);
        memcpy(keys, guessed_keys, size);

        keys->ptr = keys->buffer;
        do {
            current = single_db->salts;
            do {
                if (current == salt || !current->list)
                    continue;

                if (single_add_key(current, keys->ptr, 1)) {
                    /* this MEM_FREE() is not in jumbo, but should be */
                    // MEM_FREE(keys);
                    return 1;
                }
            } while ((current = current->next));
            keys->ptr += length;
        } while (--keys->count);

        MEM_FREE(keys);
    }

solardiz commented 5 years ago

You're right, I missed the return 1. This memory leak is also present in core. I'll fix it there along with the next round of fixes, but jumbo doesn't have to wait on me. Thanks!

jfoug commented 5 years ago

Digging deeper into the code, yes, it is a memory leak, but only a cleanup leak (I think). single_add_key() only returns 1 if single_process_buffer() (the function we are in), returns 1. The only way single_process_buffer() returns 1 is if the crack_process_salt() returns 1 which ONLY happens, when we are done (i.e when the quit command from whatever instance is unwinding the john process).

SO, even though these do show up as leaks, I think they are only leaks because we are exiting. It is not like they were leaks in continual run code. It is just not proper post run memory cleanup.

jfoug commented 5 years ago

I believe the easiest 'fix', is to simply put sanity checking in there. Like @solardiz listed, some large (but not impossible value) for recursion level. Then once that level is hit, we emit a warning, and no longer enter the retest_guessed block. So something like

     if (not_warned_recursion && recursion_level > options.max_single_recursion && retest_guessed) {
       log("Warning, due to large number of cracked hashes, single mode will switch out of retest-guessed"
          " mode.  Once this run is completed, you may wish to re-run the .pot file using -loop= mode");
       retest_guessed = 0;
    }
    if (retest_guessed)
    if (guessed_keys->count) {
// .....

In this manner, we protect the process from stack overflow when dealing with large and highly crackable 'single-mode' working sets of data, allow the retest_guessed to function (whether default, or by demand if not default), AND it also prints a warning telling the user that there may be some missed cracks, but that they can be found easily by rerunning john using the -loop= mode of input.

Then to deal with recursion_level, simply increment the recursion_level variable at start of single_process_buffer(), and decrement it at every point where single_process_buffer() exits, just like I did in the diff @magnum complained about, lol. Sorry @magnumripper, i can read both context (-c) or unified (-u) diffs. I will try and not use the context in the future.

magnumripper commented 5 years ago

The "trigger" here is the unrealistic test data, it wouldn't happen IRL

This bug was triggered by real life data :) Full context: https://www.patpro.net/blog/index.php/2017/04/09/3071-my-take-on-the-myspace-dump/

So you assume the problem you saw was indeed the recursion we are discussing now. What's your reason for assuming that? I'm not saying it can't be, I just don't assume so myself.

magnumripper commented 5 years ago

Sorry @magnumripper, i can read both context (-c) or unified (-u) diffs. I will try and not use the context in the future.

Well I can read it, but it takes me twice as long 😄

magnumripper commented 5 years ago

I have always strongly disliked the re-test guessed aspect, of single mode. It does nothing but significantly slow things down.

I disagree. Sometimes it does no good, sometimes it does wonders. When using single mode with large keys_per_crypt formats as recently improved, we really need a lot of candidates and it will NOT slow things down. When using it in staged tests with password in gecos field and a KPC of 1, obviously it will look like it's the worst idea ever. But IRL it isn't, that test is just flawed.

magnumripper commented 5 years ago

Either way we change the default or not, we should probably add an option --single-retest-guessed (or possibly --single-no-retest-guessed if we do change the default, or perhaps simply --single-retest-guessed=BOOL ).

frank-dittrich commented 5 years ago

@p4tpr0

This bug was triggered by real life data :) Full context: https://www.patpro.net/blog/index.php/2017/04/09/3071-my-take-on-the-myspace-dump/

IMHO, real life data wouldn't have most of the passwords equal to user name or at least based on user name plus some well defined mangling rules.

With your data set, it doesn't make sense to retry one guessed password on all other salts during single mode. So you should have used the already mentioned

SingleRetestGuessed = N

option.

Please do, and report the results of that test.

Frank

jfoug commented 5 years ago

Well I can read it, but it takes me twice as long 😄

Well, i used diff a LONG time, with the context mode, and got very comfortable with it. Only recently (within last 5 years or so), much to do with git, have I started working with the unified diff output format.

I used to use patch a lot. It worked well with the context version of diffs.

So the muscle memory of diff -C 3 xxx yyy is simply hard for this old dog to unlearn, lol.

jfoug commented 5 years ago

@frank-dittrich

SingleRetestGuessed = N

When running in that mode, the test from Myspace dump (or 'fake' data generated with pass_gen.pl), will not cause stack recursion problems. The fake generated data from pass_gen.pl is really the worst case situation for this issue, where every hash is cracked by the password in a gecos field.

IMHO, real life data wouldn't have most of the passwords equal to user name

Well, here you may be wrong. It 'should' not happen, however as we have seen time and time again, people implement this type of algorithms very poorly. Thus the 'smart' pen tester will find and exploit these. Having a quality tool such as john, throwing stack overflow cores when run on these 'worst case' situations is not good. This also happened around the AM dump. There was a serious implementation flaw, which allowed a tremendously good single mode. But the AM bcrypt was SO heavy, that it was near impossible to see any stack crash there, since it would simply take a LONG time.

I have no problem AT ALL, with keeping retest-guessed true by default, even though I think it really is a different 'concept' than single, and for salted hashes, especially when working with a LARGE set of them, can have runtime significantly extended.

It is not as big a deal with working with a set of data with 100 salts. There you crack a word, and it gets 'added' to the building data of 100 salts. BUT when dealing with 1 million salts, each time you crack a word, you now insert that into 1 million salts.

The biggest thing I will be working on (and would LIKE to hear some critique prior to starting this work), is to add a new john.conf setting (max-single-recursion-depth). I would say default that at 10k, but since it is in the .conf file, it can be extended if the user chooses. Then, within single, it is very simple to add a couple lines of code and count the depth of recursion during runtime. Then if that max is hit, we simply do not recurse (we may stop, or simply stop until the level drops). There will be a warning message dumped out stating why, stating which parts of the documentation to go to to see how to work around this.

Then within the docs, it would be a good time to educate on the SingleRetestGuessed = N flag. It may (almost certainly IS) something which this user, with this dataset may actually WANT to use anyway

p4tpr0 commented 5 years ago

@frank-dittrich

So you should have used the already mentioned
SingleRetestGuessed = N
option.

You are absolutely right. My John config from feb. 2017 used that settings (see first post), but I've failed to report it into the config file for current John. I've corrected that mistake and made a new test:

$ ./john --single --nolog --verbosity=1 pw-1M-real  --pot=blank-pot-4
Using default input encoding: UTF-8
Loaded 1000000 password hashes with 1000000 different salts (dynamic_25 [sha1($s.$p) 64/64])
Press 'q' or Ctrl-C to abort, almost any other key for status
73356g 0:00:00:19 1.50% (ETA: 22:53:24) 3856g/s 10374p/s 10374c/s 10374C/s brrg
125546g 0:00:00:45 1.50% (ETA: 23:22:13) 2785g/s 7502p/s 7502c/s 7502C/s aabr..Aabr
173488g 0:00:01:28 1.50% (ETA: 00:09:51) 1970g/s 5304p/s 5304c/s 5304C/s axkr..Axkr
208083g 0:00:02:17 1.50% (ETA: 01:04:08) 1518g/s 4086p/s 4086c/s 4086C/s aovkt..Aovkt
228737g 0:00:02:53 1.50% (ETA: 01:44:01) 1321g/s 3558p/s 3558c/s 3558C/s aqmpq..Aqmpq
244720g 0:00:03:25 1.50% (ETA: 02:19:28) 1193g/s 3212p/s 3212c/s 3212C/s ayrufx
263856g 0:00:04:09 1.50% (ETA: 03:08:13) 1059g/s 2851p/s 2851c/s 2851C/s tice..Tice
287041g 0:00:05:10 1.50% (ETA: 04:15:47) 925.7g/s 2493p/s 2493c/s 2493C/s auunq
290552g 0:00:05:19 1.50% (ETA: 04:25:46) 909.0g/s 2448p/s 2448c/s 2448C/s wuojx..Wuojx
Session aborted

test data file is generated as described in first post (with random mangling to get about 38% candidates that won't match password).

jfoug commented 5 years ago

I am not 100% sure this is the entire issue. I also see decreasing performance over time. I will be looking into this. There may be no error, but I am working on this, so if I find something, i will correct what I find.

frank-dittrich commented 5 years ago

We could also introduce a hard-coded limit on the recursion (e.g., 10000?), and print a "Warning: ..." when it's reached saying that some guesses won't be re-tested against all salts because of that limit.

When most hashes are cracked immediately, such a warning will not be noticed by any user, because it disappears too fast. (You would have to print that warning when the session ends or when it is interrupted by the user.)

I wouldn't mind if we change the default in jumbo to SingleRetestGuessed = N, but this should be properly documented. We could add a warning about that new default behaviour (for jumbo) at the start of a session, before any input file is processed. That way, the user has a chance to see the warning.
(The user should be able to suppress this warning by adjusting config file setting.)

jfoug commented 5 years ago

@frank-dittrich it may be that the default does not need changed (it really ONLY needs to be changed if we want it to be). What needs to happen, is that john needs to PROTECT itself.

As for printing the warning, yep, it is likely something not well seen. BUT it could 'also' be printed at graceful exit (end of data, the kill file shows up, or ^c signal is caught). So what we would need to do here, is setup some 'on exit, do this' logic to printf the warning again.

magnumripper commented 5 years ago

When most hashes are cracked immediately, such a warning will not be noticed by any user, because it disappears too fast.

True. I need to think about this over night.

openwall / john

very bad (and decreasing) performance on mode "Single" in certain circumstances #2425