Closed frank-dittrich closed 9 years ago
I get
form=pwdump_lm guesses: 1125 0:00:00:00 DONE [PASSED]
.pot CHK:pwdump_lm guesses: 981 -show=2734 0:00:00:00 DONE : Expected count(s) (986) [!!!FAILED!!!]
with a GCC build (OMP enabled) when I pass --fork=3.
$ ./john --list=build-info
Version: 1.8.0.2-bleeding-jumbo OMP
Build: linux-gnu 64-bit AVX-autoconf
Arch: 64-bit LE
$JOHN is ./
Format interface version: 12
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
Max. Markov mode level: 400
Max. Markov mode password length: 30
Compiler version: 4.8.3 20140911 (Red Hat 4.8.3-7)
gcc version: 4.8.3
OpenSSL library version: 01000105f
OpenSSL 1.0.1e-fips 11 Feb 2013
GMP library version: 5.1.2
NSS library version: 3.17 Basic ECC (loaded: 3.17 Extended ECC)
NSPR library version: 4.10.7
Kerberos version 5 support enabled
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
I get lm fails on GCC build, OMP -fork=3
JtR/bleed/test$ ./jtrts.pl -noprelims -passthru '--fork=3' -q ....... form=lm guesses: 1493 -show=2992 0:00:00:00 DONE : Expected count(s) (1500)(-show3000) [!!!FAILED!!!] .pot CHK:lm guesses: 1487 -show=2987 0:00:00:00 DONE : Expected count(s) (1500) [!!!FAILED!!!]
NOTE, if I just run against lm, I get LM failures, then LM working, then lm_pwdump working sometimes and failing others.
The runs are inconsistant. I do not think this is a problem with TS, it is something between LM and fork mode.
I can't reproduce (on x86-64 OSX). Note that the exact number for "guesses" is supposed to vary "randomly" depending on how many dupe cracks you happened to get, but the -show figure should not vary at all.
In general, issues like this "can't" really be due to -fork. Basically a forked process is no different than a -node process - there is virtually no fork specific code anywhere in john after the cracking has started. But -fork may reveal a bug that happened to hide without it.
Run this over and over again, and see if you get failures at times:
./jtrts.pl -noprelims -passthru '--fork=3' lm
On my ubuntu-64 (OMP build), sometimes it works properly, sometimes not. Yes, I understand that guesses will vary, but show should not, but I am seeing that at times jtr is NOT cracking all the hashes. I have not looked any deeper than showing that I see the exact same thing Frank is seeing.
Nope, but I made a for loop with increasing fork count and found I can trigger it with sufficiently high number of forks.
$ ./jtrts.pl -noprelims -passthru '--fork=9' lm --stoponerror
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.16, Sept 24, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [darwin13.4.0 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
John Jumbo build detected.
form=lm guesses: 1503 -show=2997 0:00:00:00 DONE : Expected count(s) (1500)(-show3000) [!!!FAILED!!!]
Exiting on error. The pot file ./tst.pot contains the found data
The command used to run this test was:
../run/john -ses=./tst --fork=9 -pot=./tst.pot LM_tst.in --wordlist=pw.dic -form=lm
$ ../run/john -pot=./tst.pot LM_tst.in -form:lm -show=left
u540-LM:2:$LM$8756e789c3e08314
u441-LM:2:$LM$34acd11f14f1a3ec
u108-LM:2:$LM$11028e91b87b43a5
1497 password hashes cracked, 3 left
Indeed, three hashes are not cracked. So this is some problem in JtR.
I can't seem to trigger the problem when using -node=1/9
instead of -fork=9
... and I did try 2/9 and so on too.
I have verified that this problem exist in versions prior to our recently added self-tests. So they are not causing it.
What scares me the most is that this is not likely a problem with LM at all. It might be wordlist mode vs. node/fork, or perhaps pot sync.
Yes, I also get a varying number of guesses when I repeatedly run
for i in `seq 2 40`; do echo $i; ./jtrts.pl -noprelims -q -passthru "--fork=$i" lm --stoponerror; done
1st run:
2
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.15, Sept 24, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
form=pwdump_lm guesses: 982 -show=2756 0:00:00:00 DONE : Expected count(s) (986)(-show2760) [!!!FAILED!!!]
...
40
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.15, Sept 24, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
form=lm guesses: 1481 -show=2980 0:00:00:00 DONE : Expected count(s) (1500)(-show3000) [!!!FAILED!!!]
2nd run:
2
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.15, Sept 24, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
All tests passed without error. Performed 2 tests. Time used was 1 seconds
...
40
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.15, Sept 24, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
form=lm guesses: 1511 -show=2995 0:00:00:00 DONE : Expected count(s) (1500)(-show3000) [!!!FAILED!!!]
This is too hard to debug using gdb or the like. We should probably use git-bisect.
Core John does not seem to have the issue (but TS has an issue https://github.com/magnumripper/jtrTestSuite/issues/17 when trying)
git-bisect is a piece of work. Here's the offending commit: 1f827c7a
Ouch! This doesn't make it easy.
OK so it happens during -fork with a shared memory mapped buffer but per-node indexes (of each node's words in that shared buffer). But how can it be intermittent!?
And what causes only LM tests to fail? For --fork=2 ... --fork=14, I only see lm and pwdump_lm tests fail.
Not sure if that's worth contemplating yet.
I tried disabling mmap() and the bug is still there. So it's not about the shared memory map but the per-node index buffer. That's a relief.
I was wrong - we don't use any index buffer unless rules are involved. So what we have is a shared memory buffer, and mgetl() instead of fgetl(). But I tried disabling mmap() and bug still shows, which means we are just using fgetl() and no magic. We practically do not use anything that was touched in that commit. So what the heck is wrong!? And how come it was introduced with that patch?
OK, it's not that commit! If I check out the commit before it, and add -mem=1 to disable memory buffering, the bug shows up!
./jtrts.pl -noprelims -passthru '--fork=9 -mem=1' lm
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.16, Sept 24, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [macosx-x86-64]
--------------------------------------------------------------------------------
John Jumbo build detected.
form=lm guesses: 1506 -show=2334 0:00:00:00 DONE : Expected count(s) (1500)(-show3000) [!!!FAILED!!!]
.pot CHK:lm guesses: 834 -show=2334 0:00:00:00 DONE : Expected count(s) (1500) [!!!FAILED!!!]
form=pwdump_lm guesses: 1276 -show=1571 0:00:00:00 DONE : Expected count(s) (986)(-show2760) [!!!FAILED!!!]
.pot CHK:pwdump_lm guesses: 318 -show=1571 0:00:00:00 DONE : Expected count(s) (986) [!!!FAILED!!!]
Some tests had Errors. Performed 2 tests.2 errors 2 errors reprocessing the .POT files
Time used was 2 seconds
Back to bisecting. But this was a relief!!
0b870fa is bad. The problem is I do not have any good commit because 0b870fa is the very merge from core that implemented -fork. So we can't bisect this. I need to re-test core.
I really can't trigger the bug in core.
OK I'm puzzled. I now tested bleeding-jumbo but with a wordlist.c from core John, just modified a tad for autoconf so it builds.
$ ./jtrts.pl -noprelims -passthru '--fork=9' lm
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.16, Sept 24, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo_omp [darwin13.4.0 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
John Jumbo build detected.
form=lm guesses: 1504 -show=2998 0:00:00:00 DONE : Expected count(s) (1500)(-show3000) [!!!FAILED!!!]
.pot CHK:lm guesses: 1498 -show=2998 0:00:00:00 DONE : Expected count(s) (1500) [!!!FAILED!!!]
form=pwdump_lm guesses: 1372 0:00:00:00 DONE [PASSED]
.pot CHK:pwdump_lm guesses: 984 -show=2758 0:00:00:00 DONE : Expected count(s) (986) [!!!FAILED!!!]
Some tests had Errors. Performed 2 tests.1 errors 2 errors reprocessing the .POT files
Time used was 2 seconds
Whatever the bug is, it's not in wordlist.c
bd10920 is the latest merge from core. That's the version to focus on - what differences do we have that can have this result?
- wordlist.c has differences, but using a wordlist.c from core still show the bug
Not true! If I use wordlist.c from core, in a bleeding-jumbo of bd10920, bug goes away. To use it I had to insert this line:
diff --git a/src/wordlist.c b/src/wordlist.c
index 064b33d..9a4b855 100644
--- a/src/wordlist.c
+++ b/src/wordlist.c
@@ -19,6 +19,7 @@
#include "params.h"
#include "common.h"
#include "path.h"
+#include "os.h"
#include "signals.h"
#include "loader.h"
#include "logger.h"
Weirdest thing, the same operation does not make the bug go away in HEAD of bleeding-jumbo.
I officially give up here, for now.
Heureka, I found a way to trigger it without using fork.
No, that just showed I'm too tired. I will give up now.
IMHO its not an issue with locking the pot file. I ran
$ ./john --format=lm ../test/LM_tst.in --wordlist=../test/pw.dic --fork=11 > john.stdout ; sort -u john.pot |wc -l
Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
Node numbers 1-11 of 11 (fork)
2 138g 0:00:00:00 DONE (2014-09-26 13:40) 6900g/s 26650p/s 26650c/s 39975KC/s SKIPPIN..R1234B5
Press 'q' or Ctrl-C to abort, almost any other key for status
10 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s B123456..Z123456
7 135g 0:00:00:00 DONE (2014-09-26 13:40) 4500g/s 17766p/s 17766c/s 24198KC/s Z123456..W123456
8 134g 0:00:00:00 DONE (2014-09-26 13:40) 4466g/s 17766p/s 17766c/s 24198KC/s LATE*NI..X123456
5 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s 1234567..U123456
9 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s ..Y123456
6 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s BERT*ER..V123456
3 136g 0:00:00:00 DONE (2014-09-26 13:40) 4533g/s 17766p/s 17766c/s 24198KC/s BERT$ER..S12345C
4 135g 0:00:00:00 DONE (2014-09-26 13:40) 4500g/s 17766p/s 17766c/s 24198KC/s BERT$ER..T123456
11 137g 0:00:00:00 DONE (2014-09-26 13:40) 4566g/s 17766p/s 17766c/s 24198KC/s ..0A23456
1 127g 0:00:00:00 DONE (2014-09-26 13:40) 3175g/s 13325p/s 13325c/s 2052KC/s SKIPPIN..Q123A45
Waiting for 10 children to terminate
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1486
and
$ ./john --format=lm ../test/LM_tst.in --wordlist=../test/pw.dic > john.stdout2 ; sort -u john.pot |wc -l
Warning: poor OpenMP scalability for this hash type, consider --fork=8
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
14g 0:00:00:00 DONE (2014-09-26 13:41) 350.0g/s 146575p/s 146575c/s 2052KC/s SKIPPIN..0A23456
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1500
Let's see what has not been cracked in the --fork=11 run:
$ cat john.stdout2
Loaded 1500 password hashes with no different salts (LM [DES 128/128 AVX-16])
Remaining 14 password hashes with no different salts
AB1266 (u1266-LM:2)
AB1244 (u1244-LM:2)
AB1086 (u1086-LM:2)
AB936 (u936-LM:2)
AB760 (u760-LM:2)
AB755 (u755-LM:2)
AB723 (u723-LM:2)
AB716 (u716-LM:2)
AB540 (u540-LM:2)
AB441 (u441-LM:2)
AB268 (u268-LM:2)
AB265 (u265-LM:2)
AB232 (u232-LM:2)
AB78 (u78-LM:2)
$ grep AB1266 john.stdout*
john.stdout2:AB1266 (u1266-LM:2)
$ grep u1244-LM:2 john.stdout*
john.stdout2:AB1244 (u1244-LM:2)
I can reproduce the bug with Markov mode.
$ cut -d: -f 2- john.pot > lm.txt
$ ./calc_stat lm.txt lm-stats
zero -10*log proba1[65] (1499) / 1500 converted to 1
zero -10*log proba2[65*256+66] (1499) / 1499, converted to 1 to prevent infinite length candidates
zero -10*log proba2[75*256+73] (1) / 1, converted to 1 to prevent infinite length candidates
zero -10*log proba2[83*256+75] (1) / 1, converted to 1 to prevent infinite length candidates
$ ./mkvcalcproba lm-stats lm.txt |cut -f 3|sort -n|uniq -c
reading stats ... [0x1ac5010]
119 lines parsed [0x1ac5010]
scanning password file ...
freeing stuff ...
charsetsize = 17
1 10
13 28
5 41
5 46
27 50
93 51
5 59
9 63
46 64
24 68
76 69
27 72
402 73
650 74
1 75
5 77
24 81
86 82
1 99
Then I created a Markov mode that should crack 1498 out of those 1500 passwords:
[Markov:lm]
Statsfile = $JOHN/lm-stats
MkvLvl = 82
MkvMaxLen = 7
MkvMinLvl = 28
; MkvMinLen = 0
I verified that this Markov mode works as expected:
$ ./john --format=lm ../test/LM_tst.in --markov=lm > /dev/null ; sort -u john.pot |wc -l;
Warning: poor OpenMP scalability for this hash type, consider --fork=8
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/lm-stats, lvl=28-82 len=7 pwd=1538)
1498g 0:00:00:00 DONE (2014-09-26 15:51) 49933g/s 51133p/s 51133c/s 76700KC/s AB1000..S
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1498
Then I tried --fork:
$ for i in `seq 2 40`; do echo $i; rm john.pot ; ./john --format=lm ../test/LM_tst.in --markov=lm --fork=$i > /dev/null ; sort -u john.pot |wc -l; done
...
4
Will run 2 OpenMP threads per process (8 total across 4 processes)
Node numbers 1-4 of 4 (fork)
2 367g 0:00:00:00 DONE (2014-09-26 15:53) 12233g/s 13100p/s 13100c/s 19650KC/s AB1341..AB322
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/lm-stats, lvl=28-82 len=7 pwd=1538 split over nodes)
3 383g 0:00:00:00 DONE (2014-09-26 15:53) 12766g/s 13133p/s 13133c/s 14906KC/s AB314..AB669
1 375g 0:00:00:00 DONE (2014-09-26 15:53) 9375g/s 9850p/s 9850c/s 3851KC/s AB1000..AB1350
Waiting for 3 children to terminate
4 373g 0:00:00:00 DONE (2014-09-26 15:53) 9325g/s 9600p/s 9600c/s 10896KC/s AB660..S
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1486
5
Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
Node numbers 1-5 of 5 (fork)
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/lm-stats, lvl=28-82 len=7 pwd=1538 split over nodes)
4 317g 0:00:00:00 DONE (2014-09-26 15:53) 15850g/s 15850p/s 15850c/s 23775KC/s AB452..AB737
2 296g 0:00:00:00 DONE (2014-09-26 15:53) 14800g/s 15850p/s 15850c/s 23775KC/s AB1272..AB189
3 310g 0:00:00:00 DONE (2014-09-26 15:53) 15500g/s 15800p/s 15800c/s 23700KC/s AB1814..AB460
5 303g 0:00:00:00 DONE (2014-09-26 15:53) 15150g/s 15400p/s 15400c/s 23100KC/s AB729..S
1 312g 0:00:00:00 DONE (2014-09-26 15:53) 15600g/s 15850p/s 15850c/s 23775KC/s AB1000..AB1281
Waiting for 4 children to terminate
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1498
6
Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
Node numbers 1-6 of 6 (fork)
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/lm-stats, lvl=28-82 len=7 pwd=1538 split over nodes)
2 266g 0:00:00:00 DONE (2014-09-26 15:53) 13300g/s 13300p/s 13300c/s 19950KC/s AB1226..AB1465
3 239g 0:00:00:00 DONE (2014-09-26 15:53) 11950g/s 13250p/s 13250c/s 19875KC/s AB1457..AB322
4 266g 0:00:00:00 DONE (2014-09-26 15:53) 13300g/s 13300p/s 13300c/s 19950KC/s AB314..AB553
1 248g 0:00:00:00 DONE (2014-09-26 15:53) 12400g/s 13300p/s 13300c/s 3511KC/s AB1000..AB1235
Waiting for 5 children to terminate
6 251g 0:00:00:00 DONE (2014-09-26 15:53) 25100g/s 25600p/s 25600c/s 38400KC/s AB776..S
5 266g 0:00:00:00 DONE (2014-09-26 15:53) 13300g/s 13300p/s 13300c/s 19950KC/s AB545..AB784
Warning: passwords printed above might be partial and not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
1496
...
As with wordlist mode, the number of cracked passwords varies when I run the same command repeatedly.
Should we focus on the obvious? LM built with OMP has some issues (somehow) with -fork
With --single=none and user name = password,
AB251:$LM$0004c62411e4e2a4
...
I am not able to reproduce the error. I guess that's due to the fact that in this case it's not the list of password candidates that gets split across "nodes".
When I change 1499 user names into AB and use this as single mode
[List.Rules:lm]
:
$[0-9]
Az"[1-90][0-9]"
Az"[1-90][0-9][0-9]"
Az"1[0-4][0-9][0-9]"
I can reproduce the error using single mode:
$ for i in `seq 2 40`; do echo -n $i; echo -n " "; rm john.pot ; ./john --format=lm lm.hashes --single:lm --fork=$i > /dev/null 2> /dev/null ; sort -u john.pot |wc -l; done
2 1496
3 1494
4 1488
5 1484
6 1491
7 1495
8 1493
9 1491
10 1486
11 1474
12 1492
13 1479
14 1488
15 1478
16 1492
17 1490
18 1495
19 1491
20 1494
21 1496
22 1490
23 1496
24 1495
25 1489
26 1498
27 1491
28 1493
29 1492
30 1497
31 1480
32 1493
33 1477
34 1494
35 1489
36 1495
37 1485
38 1479
39 1479
40 1497
On 09/26/2014 04:07 PM, JimF wrote:
Should we focus on the obvious? LM built with OMP has some issues (somehow) with -fork?
Since I reproduced this with clang (which doesn't support OMP), it is not related to OMP.
And so far we couldn't reproduce it in core. So, LM triggers it either because it is extremely fast (which helps to trigger race conditions) or may be because hashes are split into 2 halves. If it is because hashes are split, we should be able do reproduce this with bigcrypt (descrypt) hashes. But I doubt that is the issue.
BTW: I can reproduce the issue not just with the pw.dic from jtrts, but also with this file (containing almost 10 million password candidates of length up to 7 and converted to upper case:
$ wc -l LM-shuf.txt
9384287 LM-shuf.txt
I created it like this:
cut -b 1-7 rockyou.txt ../test/pw.dic |tr a-z A-Z |LC_ALL=C sort -u | shuf -o LM-shuf.txt
So the error does not depend on the high frequency of valid password candidates in the worlist file.
Hmm so we have indications we only miss right-side hash halves? That is very interesting. We could make a file with all the same hashes but in raw single-hash-per-line format instead of pwdump and see what happens.
Nope, that was not it. Same problem. Also, using an infile completely lacking dupe halves does not help.
Maybe we should make a much bigger input file, and run it with lots of threads many times with core John so we really rule it out.
I ran core with 500,000 hashes and 2-30 forks, it didn't miss a beat.
Right now I have this without even using fork
$ ./jtrts.pl -noprelims lm
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.12.16, Sept 24, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-bleeding-jumbo [darwin13.4.0 64-bit AVX-autoconf]
--------------------------------------------------------------------------------
John Jumbo build detected.
form=lm guesses: 1500 0:00:00:00 DONE [PASSED]
.pot CHK:lm guesses: 1500 0:00:00:00 DONE [PASSED]
form=pwdump_lm guesses: 646 -show=2385 0:00:00:00 DONE : Expected count(s) (986)(-show2760) [!!!FAILED!!!]
.pot CHK:pwdump_lm guesses: 646 0:00:00:00 DONE [PASSED]
Some tests had Errors. Performed 2 tests.1 errors
Time used was 3 seconds
This is with JtR 0ffc0c68d and TS magnumripper/jtrTestSuite@beadda6f. I'm probably just being dumbforked now? Better not even try to do anything.
EDIT: The above problem was due to a john.local.conf enabling UTF-8 and other things. Another footgun!
Tests with non-OMP-builds (as well as with OMP_NUM_THREADS=1) suggest that the number of forked processes must be greater than the number of physical CPUs (or at least some of the CPUs must be busy with some other work) to cause failed tests.
Funny enough, the change that "fixes" issue #798 helps to avoid this bug as well.
$ git diff
diff --git a/run/john.conf b/run/john.conf
index fc8e3f7..4b4799c 100644
--- a/run/john.conf
+++ b/run/john.conf
@@ -156,7 +156,7 @@ SecureMode = N
# If set to Y, a session using --fork or MPI will signal to other nodes when
# it has written cracks to the pot file (note that this writing is delayed
# by buffers and the "Save" timer above), so they will re-sync.
-ReloadAtCrack = Y
+ReloadAtCrack = N
# If set to Y, resync pot file when saving session.
ReloadAtSave = Y
Yeah but the bug can be seen long before pot reload was implemented at all. So this only changes the playground a little.
Could other bugs that meanwhile have been fixed have caused this problem before pot reload has been implemented?
I think not. I'm pretty sure I already tested the very commit before pot reload was introduced (because it is a "usual suspect") and the bug was there.
Then may be you did something wrong. Even with ReloadAtCrack = Y, your latest commit 3b4c29f471d05bfa48beda8249d7dfef4c4e2746 fixes that problem for me. But when I checkout 479a6ea7b2b60d28aad1321426a0ad3304a221f0, the bug still exists.
This bug seems to be related to #798. With latest bleeding-jumbo, I (rarely) get the problem with the number of forked processes close to the number when bug #798 occurs:
$ (for i in `seq 650 999`; do echo -n "$i "; rm john.pot john.log; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} > /dev/null 2> /dev/null; LC_ALL=C sort -u john.pot | wc -l; done); echo $?
650 1500
651 1500
652 1500
653 1500
654 1500
655 1500
656 1500
657 1500
658 1500
659 1500
660 1500
661 1500
662 1500
663 1499
664 1500
665 1500
666 1500
667 1500
668 1500
669 User defined signal 2
140
In this case, --fork=663 cracked just 1499 hashes.
At least, john now behaves in a consistent way:
$ (for i in `seq 1 99`; do rm john.pot john.log; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=663 > /dev/null 2> /dev/null; echo -n " `LC_ALL=C sort -u john.pot | wc -l`"; done); echo $?
1499 1499 1499 1499 1499 1499 1499 1499 1499^C^C
Which reminds me of http://www.despair.com/consistency.html
With this john.local.conf
[Local:Options]
ReloadAtCrack = N
ReloadAtSave = N
I manage to get less than 1500 unique john.pot lines only if I hit the ulimit of (in my case currently 1024) processes per user. That looks like this:
[fd@f20x32 run]$ (for i in `seq 1020 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt; LC_ALL=C sort -u john.pot | wc -l; done |LC_ALL=C grep -v 1500 ); echo $?
/bin/rm: cannot remove ‘john.pot’: No such file or directory
bash: fork: retry: Resource temporarily unavailable
1020 174
bash: fork: retry: Resource temporarily unavailable
1021 32
1022 53
1023 15
1024 16
0
[fd@f20x32 run]$ grep error t-102*.log
t-1020.log:1 0:00:00:00 Terminating on error, john.c:496
t-1020.log:1 0:00:00:00 Terminating on error, john.c:496
t-1021.log:1 0:00:00:00 Terminating on error, john.c:496
t-1021.log:1 0:00:00:00 Terminating on error, john.c:496
t-1022.log:1 0:00:00:00 Terminating on error, john.c:496
t-1022.log:1 0:00:00:00 Terminating on error, john.c:496
t-1023.log:1 0:00:00:00 Terminating on error, john.c:496
t-1023.log:1 0:00:00:00 Terminating on error, john.c:496
t-1024.log:1 0:00:00:00 Terminating on error, john.c:496
t-1024.log:1 0:00:00:00 Terminating on error, john.c:496
So I guess I'll kep that john.local.conf ans issue a
killall -s USR2 john
Whenever I want a pot file sync.
I think we should fix (or at least consider) https://github.com/magnumripper/jtrTestSuite/issues/20 before going on with this.
Anyway, I can not reproduce this bug on OSX anymore, even with default john.conf. Except when I go past ulimits.
However, I can reproduce it using 9d9c271 (and -mem=1) which was before there was any pot reload or USR2 signalling at all.
Can you reproduce this with current code?
Before Monday I'll not have access to the system where I want to do the re-test.
$ (for i in `seq 2 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
323 0 1499
325 0 1499
331 0 1499
342 0 1499
346 0 1499
350 0 1499
352 0 1499
356 0 1499
358 0 1499
373 0 1499
374 0 1499
376 0 1499
377 0 1499
379
0
$ sort -u john.pot | wc -l
1499
$ grep -i error t-3??.log
$ grep -c Cracked t-377.log
1518
$ grep Cracked t-377.log |cut -d+ -f 2- |sort -u|wc -l
1499
$ wc -l stdout-377.txt
1519 stdout-377.txt
$ head stdout-377.txt
Loaded 1500 password hashes with no different salts (LM [DES 128/128 SSE2])
AB208 (u208-LM:2)
AB585 (u585-LM:2)
AB962 (u962-LM:2)
AB1339 (u1339-LM:2)
AB207 (u207-LM:2)
AB584 (u584-LM:2)
AB961 (u961-LM:2)
AB1338 (u1338-LM:2)
AB216 (u216-LM:2)
$ head -n 4 stderr-377.txt
Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
Node numbers 1-377 of 377 (fork)
6 4g 0:00:00:00 DONE (2014-11-09 22:42) 12.12g/s 48.48p/s 48.48c/s 72727C/s BERT*ER..IZE__12
3 4g 0:00:00:00 DONE (2014-11-09 22:42) 6.779g/s 27.11p/s 27.11c/s 40677C/s BERT$ER..L3__12
$ grep -c " DONE " stderr-377.txt
377
$ grep -v -n " DONE " stderr-377.txt
1:Warning: OpenMP was disabled due to --fork; a non-OpenMP build may be faster
2:Node numbers 1-377 of 377 (fork)
30:Press 'q' or Ctrl-C to abort, almost any other key for status
195:Waiting for 376 children to terminate
382:Warning: passwords printed above might be partial and not be all those cracked
383:Use the "--show" option to display all of the cracked passwords reliably
384:Session completed
So, I need a much higher number of forked processes to reproduce this.
But the number of forked processes (> 300, but < 400 on an otherwise idle system)is still way lower than the ulimit -u
(1024).
I get no indication of errors in log files, in stderr or stdout output or exit code.
Trying the same on OSX, it works fine up to 572, and from that point the problem can be seen in the stderr file: "fork: Resource temporarily unavailable". So I can't reproduce any problem with John. That's with an ulimit of 709 max user processes.
This is with latest jtrts commit (19ca304106ebebb5d0d9b717adc2cb4626cc9808) and latest bleeding-jumbo commit (445440621046cfb3f0d7efe3fb1e062385f6ffb4)