LM problem w/ -fork - Githubissues

frank-dittrich commented 10 years ago

./jtrts.pl -noprelims -passthru '--fork=3'
...
form=pwdump_lm                    guesses: 1122 -show=2757 0:00:00:00 DONE : Expected count(s) (986)(-show2760)  [!!!FAILED!!!]
.pot CHK:pwdump_lm                guesses:  983 -show=2757 0:00:00:00 DONE : Expected count(s) (986)  [!!!FAILED!!!]
...
Some tests had Errors. Performed 280 tests.1 errors  1 errors reprocessing the .POT files
Time used was 465 seconds

This is with latest jtrts commit (19ca304106ebebb5d0d9b717adc2cb4626cc9808) and latest bleeding-jumbo commit (445440621046cfb3f0d7efe3fb1e062385f6ffb4)

$ ./john --list=build-info 
Version: 1.8.0.2-bleeding-jumbo
Build: linux-gnu 64-bit AVX-autoconf
Arch: 64-bit LE
$JOHN is ./
Format interface version: 12
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
Max. Markov mode level: 400
Max. Markov mode password length: 30
Compiler version: 4.2.1 Compatible Clang 3.4 (tags/RELEASE_34/final)
gcc version: 4.2.1
clang version: 3.4 (tags/RELEASE_34/final)
OpenSSL library version: 01000105f
OpenSSL 1.0.1e-fips 11 Feb 2013
GMP library version: 5.1.2
NSS library version: 3.17 Basic ECC (loaded: 3.17 Extended ECC)
NSPR library version: 4.10.7
Kerberos version 5 support enabled
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's

frank-dittrich commented 10 years ago

When I change Save = 60 to Save = 62 in john.conf, the error occurs much later:

606 0   1499
621 0   1499
626 0   1499
675 0   1499
688 0   1499
690 0   1499
692 0   1499
698 0   1499
706 0   1499
707 0   1499
708 0   1497
716 0   1499
718 0   1499
723 0   1499
726

Still no indication of an error in *.log stderr-*.txt stdout-*.txt.

Can you try schedtool -a 0x1 -e ./john ... instead of ./john?

magnumripper commented 10 years ago

I do not have anything named schedtool.

magnumripper commented 10 years ago

I can't see why the Save timer would affect this. No session runs for that long anyway, right?

magnumripper commented 10 years ago

I can't see why the Save timer would affect this

OK, it's because OS_TIMER counts backwards.

frank-dittrich commented 10 years ago

With Save = 2 I get

$ (for i in `seq 2 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
268 0   1499
282 0   1499
293 0   1499
304 0   1499
318 0   1499
319 0   1499
325 0   1499
332 0   1499
333 0   1499
335 0   1497
340 0   1497
...
957 0   1499
958 0   1499
959 0   1499
960 0   1498
961 0   1496
bash: fork: retry: Resource temporarily unavailable
962 1   79
963 1   23
964 1   2
965 1   97
966 1   2
bash: fork: retry: Resource temporarily unavailable
967 1   0
968 1   0
969 1   41
970 1   7
971 1   0
972 1   0
973 1   0
974 1   4
975 1   0
976 1   5
977 1   29
978 1   0
979 1   0
bash: fork: retry: Resource temporarily unavailable
980 1   0
981 1   53
bash: fork: retry: Resource temporarily unavailable
982 1   12
983 1   25
984 1   31
985 1   0
986 1   0
987 1   0
988 1   0
989 1   7
bash: fork: retry: Resource temporarily unavailable
990 1   2
991 1   0
992 1   2
993 1   32
994 1   0
995 1   41
996 1   5
997 1   0
998 1   9
999 1   25
1000    1   15
1001    1   5
1002    1   33
1003    1   0
bash: fork: retry: Resource temporarily unavailable
1004    1   3
1005    1   13
1006    1   2
1007    1   5
1008    1   0
1009    1   2
1010    1   0
1011    1   28
1012    1   44
1013    1   18
1014    1   4
bash: fork: retry: Resource temporarily unavailable
1015    1   2
1016    1   10
1017    1   7
1018    1   2
1019    1   0
1020    1   25
1021    1   9
1022    1   19
1023    1   9
1024    1   1
0

$ grep -i error t-96[12].log
t-962.log:1 0:00:00:00 Terminating on error, john.c:489

frank-dittrich commented 10 years ago

On 11/09/2014 11:38 PM, magnum wrote:

Trying the same on OSX, it works fine up to 572, and from that point the problem can be seen in the stderr file: "fork: Resource temporarily unavailable". So I can't reproduce any problem with John.

I bet with less than 572 forks, john finishes in less than 1 second on you OSX system, so you don't get the signals when 59 seconds of the 60 are left.

frank-dittrich commented 10 years ago

On my 64bit Linux system with an Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz (quad core, no hyperthreading) I manage to get the john killed with SUGUSR2 at 761 forked processes (with Idle=Y in john.conf, when I run 4 other john processes (with Idle = N in john.conf) trying (not) to crack the rar test hashes: ./john --incremental --min-length=6 hashes.rar --session=rar4 ...

30670 fd        20   0  542700  48620   9224 R  92.8  0.6  18:16.65 ./john --session=rar1 --incremental --min-length=6 hashes.rar                                           
27686 fd        20   0  542704  45892   9188 R  90.8  0.6  14:33.13 ./john --incremental --min-length=6 hashes.rar --session=rar5                                           
20452 fd        20   0  542700  48624   9216 R  85.8  0.6  20:27.23 ./john --incremental --min-length=6 hashes.rar --session=rar3                                           
20484 fd        20   0  542700  48592   9188 R  85.8  0.6  20:00.28 ./john --incremental --min-length=6 hashes.rar --session=rar4                                           
10178 fd        20   0   93968  10644   6844 R   4.0  0.1   0:00.12 ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=510 --session=t-510

And some tests with a smaller number of forked processes exited with $? = 0, but less than 1500 unique hashes cracked.

$ (for i in `seq 500 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
631 0   1499
726 0   1499
737 0   1499
743 0   1499
755 0   1498
761 
0

frank-dittrich commented 10 years ago

As root, with ulimits -u = 7794 (compared to 1024 as a regular user), nothing changes.

# (for i in `seq 2 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
/bin/rm: cannot remove ‘john.pot’: No such file or directory
350 0   1499
355 0   1499
356 0   1499
358 0   1499
373 0   1499
374 0   1499
380 0   1497
389 0   1498
391 0   1499
394 
0
# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 7794
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 7794
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

With some more changes in ulimits

# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15588
max locked memory       (kbytes, -l) 256
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 1638400
real-time priority              (-r) 0
stack size              (kbytes, -s) 16384
cpu time               (seconds, -t) unlimited
max user processes              (-u) 7794
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I get

# (for i in `seq 2 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done | LC_ALL=C grep -v 1500 ); echo $?
/bin/rm: cannot remove ‘john.pot’: No such file or directory
202 0   1499
207 0   1499
238 0   1498
266 0   1499
268 0   1499
271 
0

So, this is most likely not related to ulimit.

magnumripper commented 10 years ago

I bet https://github.com/magnumripper/JohnTheRipper/issues/798#issuecomment-62325144 would make it much better, (at least provided you don't use a Save interval that is a multiple of 3).

frank-dittrich commented 10 years ago

I bet #798 (comment) would make it much better, (at least provided you don't use a Save interval that is a multiple of 3).

I tested with the default Save interval of 60, which is a multiple of 3. Why do you think that matters here? All these test runs are much faster than 60 seconds, even on my old 32bit system.

(for i in `seq 300 1024`; do echo -n -e "$i\t"; echo $i > i.txt; /bin/rm john.pot; ./john --wordlist=../test/pw.dic ../test/LM_tst.in --format=lm --fork=${i} --session=t-$i > stdout-$i.txt 2> stderr-$i.txt ; echo -n -e "$?\t"; LC_ALL=C sort -u john.pot | wc -l; done |grep -v 1500 ); echo $?
rm: cannot remove ‘t-*.rec’: No such file or directory
603 0   1499
608 0   1499
663 0   1499
675 0   1499
696 0   1499
700 0   1497
711 0   1498
714 0   1498
717 
0

No indication of problems in log file, stderr or stdout output. So, the improvement is similar to what I got when I use Save = 60 without that patch.

frank-dittrich commented 10 years ago

BTW, the --fork=717 test that got killed produced a john.pot file with 1500 unique hashes.

magnumripper commented 10 years ago

I tested with the default Save interval of 60, which is a multiple of 3. Why do you think that matters here? All these test runs are much faster than 60 seconds, even on my old 32bit system.

Because, like you found out, without that patch and with OS_TIMER, some things would happen after 0 seconds instead of after three seconds. Maybe that is unrelated.

frank-dittrich commented 10 years ago

Because, like you found out, without that patch and with OS_TIMER, some things would happen after 0 seconds instead of after three seconds. Maybe that is unrelated.

Yes, but that was without the patch. With Save =60, after one second, you'd get the SIGUSR2 signals, because 59 & 3 == 3.

With your patch, you'd get the SIGUSR2 signals 2 seconds later, no matter if you have Save = 60 or Save = 62. (60 - 57) & 3 == 3 and (62 - 59) & 3 == 3.

magnumripper commented 10 years ago

I had the vague idea that under this crazy over-booking of resources there could be a difference. But now that I think about it, we should already be "protected" against USR2 anyway: The real key is calling sig_init() and sig_init_child() early enough and this doesn't change that. I can't think of any way to do those earlier than we do now iirc.

magnumripper commented 10 years ago

I can't think of any way to do those earlier than we do now iirc.

On the other hand, if we indeed introduce yet another counter instead of using ((timer_save_interval - timer_save_value) & 3) == 3), we could also opt to never ever send a USR2 during the first minute, or whatever we choose.

magnumripper commented 10 years ago

Anyway we put it, I think this issue and #798 are purely academical. If we can find solutions for them, fine. If we can't, I will not lose any sleep over it.

magnumripper commented 9 years ago

I'm closing this. If some similar problem can still be triggered, please open a brand new issue. This issue is too clobbered anyway.

openwall / john

LM problem w/ -fork #758