Parsing error for status lines ending with ')'

frank-dittrich commented 9 years ago

See also the last comments in https://github.com/magnumripper/JohnTheRipper/issues/1074. It is a real TS bug (parsing error, not related to mangled stdout output due to fork.)

Here is how to reproduce it without fork:

First make sure that all the passwords in pw-new.dic that end with a ) character are at the end of that file:

(master)test $ grep "[)]$" pw-new.dic > pw-new.dic2
(master)test $ grep -v "[)]$" pw-new.dic > pw-new.dic1
cat pw-new.dic1 pw-new.dic2 > pw-new.dic

Then

(master)test $ ./jtrts.pl -type=dynamic -stoponerror
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-jumbo-1-bleeding_omp [linux-gnu 64-bit AVX-autoconf]
--------------------------------------------------------------------------------

John Jumbo build detected.

form=dynamic_0                    guesses: 1502 0:00:00:00 DONE  [PASSED]
.pot CHK:dynamic_0                guesses: 1500 0:00:00:00 DONE  [PASSED] (1500 val-pwd)

form=dynamic_0                    guesses:  920 0:00:00:00 DONE  [PASSED]
.pot CHK:dynamic_0                guesses:  920 -show= 920 0:00:00:00 DONE : Expected count(s) (1500)  [!!!FAILED4!!!]  (920 val-pwd  1 inval-pwd)
Exiting on error. The .pot file ./tst.pot contains the found data
The command used to run this test was:

../run/john -ses=./tst  -pot=./tst.pot dynamic_0_new_tst.in --wordlist=pw-new.dic

(master)test $ ./jtrts.pl -type=dynamic -stoponerror -v -v
...
t6)FppppV'j)     (t6)FppppV'j))
?ziypppppG"Z)    (?ziypppppG"Z))
Warning: detected hash type "dynamic_0", but the string is also recognized as "Raw-MD5"
Use the "--format=Raw-MD5" option to force loading these as that type instead
Warning: poor OpenMP scalability for this hash type, consider --fork=8
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
920g 0:00:00:00 DONE (2015-03-04 16:52) 30666g/s 30666p/s 30666c/s 28213KC/s aab2b..?ziypppppG"Z)
Use the "--show" option to display all of the cracked passwords reliably
Session completed

FAILED line = guesses: 920  0:00:00:00 DONE (2015-03-04 16:52) 30666g/s 30666p/s 30666c/s 28213KC/s aab2b..?ziypppppG"Z)
Execute john: ../run/john -show   -pot=./tst.pot dynamic_0_new_tst.in -form=dynamic_0 2>&1

d_show_line2 = 
920 password hashes cracked, 580 left
1 invalid passwords
920 valid passwords

.pot CHK:dynamic_0                guesses:  920 -show= 920 0:00:00:00 DONE : Expected count(s) (1500)  [!!!FAILED4!!!]  (920 val-pwd  1 inval-pwd)
Exiting on error. The .pot file ./tst.pot contains the found data
The command used to run this test was:

../run/john -ses=./tst  -pot=./tst.pot dynamic_0_new_tst.in --wordlist=pw-new.dic

Out of 920 passwords, there are 920 valid passwords + 1 invalid (due to the parsing error)

jfoug commented 9 years ago

Being at the end of the file is actually not a valid way to approach. They can still end up in a file, IF the length PLAINTEXT_LENGTH is short enough that it causes the whole input file to be used. I may want to simply remove them, but that may also require some regeneration of input files.

jfoug commented 9 years ago

$ grep "[)]:$" dynamic_0_new_tst.in
u27:$dynamic_0$5347604b7de9a53775a7d81ae12419f6:27:0:Et9tppppppppppppppppgSF):
u32:$dynamic_0$dd6a0523fafb43575c5a6154f45e83b2:32:0:;d1=ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp($!):
u73:$dynamic_0$76eea47e17f197634ff0bfbf481801dc:73:0:;_TEpppppppppp~h-):
u196:$dynamic_0$1f8bebb48f98d680d9bd9941e9e2de81:196:0:yp/Hppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp\&_):
u227:$dynamic_0$76bcd3383aaf6da0d2b6523c82ef8436:227:0:0hFGpppppppppppppppppppppppppppppppppppppppppppv5C):
u356:$dynamic_0$4f905588949b88a3c5dc2f570b9357da:356:0:<(}appppppppppppppppppppppppppppppppppppppppppppppppA@<):
u548:$dynamic_0$c82d12e38709265b82d7c45166cf7e42:548:0:}~%^ppppppppppppppppppppppppppppppppppppppppppp|Au):
u607:$dynamic_0$137081ae6efae049f1ced24ea881e536:607:0:phQ.pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp946)
u886:$dynamic_0$e70ecf7587290d896207ff47f7e98ead:886:0:Ts(}p▒ppppp▒ppppp▒ppppp▒pppppppppppppppppppppppppppppppppppppppppppppppppp Z%):
u955:$dynamic_0$2bdc635611b4aa66956f11d624df1413:955:0:tFJ`ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppgzO):
u984:$dynamic_0$3d185db8a9920c811a6bec04957204c4:984:0:|-^xpppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppph]a):
u1169:$dynamic_0$1d1762d34b01ae3a961abfe43988f61b:1169:0:xN}%ppppppp>Su):
u1346:$dynamic_0$f63d2726f511da47f0c763ba995134b8:1346:0:1~p/pppppppzf/):
u1347:$dynamic_0$0165726de0c6f1405759dec7c22f93c9:1347:0:t6)FppppV'j):
u1445:$dynamic_0$06484ad45b6262cb94e9297eb92747e7:1445:0:?ziypppppG"Z):

Ok, there are numerous. I wonder what is triggering your runs to have these problems (other than garbled input). I think it is where the ')$' from a found password smashes one of the fork guess lines. So it may be in the guess detect logic, which I was also looking at simply removing, and reading the .pot file to find the count.

frank-dittrich commented 9 years ago

I only moved the passwords ending with ) to the end of the word list so that I can reliably trigger the parsing bug. Otherwise, you needed to be lucky that one of the status lines (of the forked processes) had passwords any_password..a_password_endingwith) If these "problematic" passwords appear in a random sequence, then the more forked processes, the higher the chance one ends with such a status line. (Of course, changing pw-new.dic without also adjusting some word list sizes in jtrts.dat, you might now get other FAILed tests. )

jfoug commented 9 years ago

But we can not simply use a file like that Frank. There are formats where I use the first X lines of the file (some of the slower ones). I create a truncated input file. This was done so that when there ARE problems with the format and it is not finding things, that it does not run a LONG time, but exits much quicker with a listing that only part of them were found.

frank-dittrich commented 9 years ago

I did not seriously suggest we change pw-new.dic. It was just a temporary workaround to be able to more reliably get those status lines which list passwords any_password..a_password_ending_with_), even without using --fork. The problem is that you treat these status lines where the final password happens to end with a ) as if they were lines reporting cracked passwords.

jfoug commented 9 years ago

Part of the 'parsing' problem is it is truly free form. There is very little to go on.

This could be a bad line:

guesses: 920 0:00:00:00 DONE (2015-03-04 16:52) 30666g/s 30666p/s 30666c/s 28213KC/s aab2b..?ziypppppG"Z)

as could this

E (2015-03-04 16:52) 30666g/s 30666p/s 30666c/s 28213KC/s aab2b..?ziypppppG"Z)

The format is simply this:

something (something)

so if the line ends with a ')' and has a '(' somewhere, then that is really all I have to go on to check. The code is written to FIND cases where there are problems, so checking for 'something' == 'something' only tells us we have a correct password.

This is perfectly expected (but buggy) for JtR to ouput: 1 (111) and we have seen that for bfegg. It shows a problem (well a problem in the actual hash algorithm, not really in JtR). So what makes parsing 1 (111) any better than parsing E (2015-03-04 16:52) 30666g/s 30666p/s 30666c/s 28213KC/s aab2b..?ziypppppG"Z) ? Yes, in the 2nd one, we do not have a matched pair of ( and ). I 'guess' I could look for that. But I will assure you, that is simply a band aide, and will not fix anything.

jfoug commented 9 years ago

Ok, I see the problem you are explaining now.

The 'easy' work around for that is to add this to the pot_match_pass() function:

    if (index($line, "Will run ") == 0 && index($line, "OpenMP") > 0) { return 1; }
    if (index($line, "Node numbers ") == 0) { return 1; }
+   if (index($line, "guesses: ") == 0) { return 1; }
    my $idx = index($line, " (");
    my $s = substr($line, $idx+2);

I do not think this fixes the fork problems, but should fix the problem you are seeing. I think the problem you are seeing will vary based upon OMP_NUM_THREADS

jfoug commented 9 years ago

What is the OMP thread count you are running on this machine?

frank-dittrich commented 9 years ago

BFegg is different, because there is more than one valid password per hash. May be it would be good enough to stop looking for more cracked passwords if you already found all your expected passwords.

frank-dittrich commented 9 years ago

Default OMP tread count here is 8 (4 physical cores plus HT)

jfoug commented 9 years ago

Part of the problem I am having, is I do not see same results. I do not know if this is differences between mmap, or something else. But at OMP=8, all is fine (before the patch I am listing above). I am testing on other systems also.

frank-dittrich commented 9 years ago

Checking for "guesses" will only work if we never have a "guesses" password in one of our test file. Checking whether the part between the final '(' and ')' is a valid user name is probably overkill, but it could also help, as long as we don't use user names containing '(' .

jfoug commented 9 years ago

It would have to be "guesses: " (with the colon and space), and be the first thing in the password, but yes, you are correct, that would not be a valid string to have in the input file.

jfoug commented 9 years ago

Please try after this change: 1d43403

jfoug commented 9 years ago

Ok, I am finally able to see a 'few' of these things:

./jtrts.pl -passthru="-fork=8" new -q
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-jumbo-1-bleeding_omp_memdbg_asan [linux-gnu 64-bit SSE4.1-autoconf]
--------------------------------------------------------------------------------
.pot CHK:dynamic_35               guesses: 1500 -show=1500 0:00:00:00 DONE : Expected count(s) (1500)  [!!!FAILED4!!!]  (1500 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 113 tests.  1 errors reprocessing the .POT files       
Time used was 234 seconds
jimfougeron@jimfougeron-vb-14-04:/JtR/bleed/test$ ./jtrts.pl -passthru="-fork=8" new -q
./jtrts.pl -passthru="-fork=8" new -q
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper password cracker, version 1.8.0.2-jumbo-1-bleeding_omp_memdbg_asan [linux-gnu 64-bit SSE4.1-autoconf]
--------------------------------------------------------------------------------
.pot CHK:md5crypt                 guesses: 1500 -show=1500 0:00:00:00 DONE : Expected count(s) (1500)  [!!!FAILED4!!!]  (1500 val-pwd  1 inval-pwd)
Some tests had Errors. Performed 113 tests.  1 errors reprocessing the .POT files       
Time used was 231 seconds

I will now try with the guesses and see if I can run this with none of these problems. BUT this is from using -fork. I am not able to do this simply with OMP running on my system.

frank-dittrich commented 9 years ago

Funny thing, I wasn't able to reproduce anymore.

But then I did a

git reset --hard 35f406863db54615e2cd26e0aa88b38ad00759a8

So, then I tested without your previous 3 commits commit 7e404b38227026d6fa88a22aff3cf093c02bbc7b Author: jfoug jfoug@cox.net Date: Wed Mar 4 08:57:28 2015 -0600

handle minor fix to .pot -fork remove

commit e7d0b46e5c2bf2667329b099e55025377a2fca49 Author: jfoug jfoug@cox.net Date: Wed Mar 4 08:55:42 2015 -0600

handle --fork also

commit fa18818172aa47dee865fce7fd43cb2f54a4391d Author: jfoug jfoug@cox.net Date: Wed Mar 4 08:53:08 2015 -0600

removed -fork from .pot recheck run

I verified that the problem was still there, then just applied the https://github.com/magnumripper/jtrTestSuite/commit/1d4340328a74d4693e77ca7f0e40b3dc7ebe6372 change and verified that this fix alone also solved the problem for me.

I guess it is a good idea to keep all these changes (avoid stdout mangling problems due to --fork for pot file re-processing, and skipping the "guesses: " lines.

But I wonder if you might skip lines that you wouldn't want to skip, because john's output doesn't contain "guesses: " lines anymore.

john prints:

20 28g 0:00:00:00 DONE (2015-03-04 18:44) 933.3g/s 4033p/s 4033c/s 6050KC/s 123..|R4a,+,F
32 26g 0:00:00:00 DONE (2015-03-04 18:44) 866.6g/s 4033p/s 4033c/s 6050KC/s af<xppppppppppppppppppppppppppppppppppppppppppppppppppp..sT3Tppp-E'@
18 31g 0:00:00:00 DONE (2015-03-04 18:44) 1033g/s 4033p/s 4033c/s 6050KC/s nGtZpeny\..[b/=I Z
19 27g 0:00:00:00 DONE (2015-03-04 18:44) 900.0g/s 4033p/s 4033c/s 6050KC/s p..2f@;ppppppppp-uYV
28 29g 0:00:00:00 DONE (2015-03-04 18:44) 966.6g/s 4033p/s 4033c/s 6050KC/s vOQZppppppppppppppppppppppppppppppppppppppppppppppppppp..6l<dpppppp,P`-
30 28g 0:00:00:00 DONE (2015-03-04 18:44) 933.3g/s 4033p/s 4033c/s 6050KC/s #1w!ppppppppppppppppppppppppppppppppppppppppppppppppppp..Y`!mpp-@0I
31 28g 0:00:00:00 DONE (2015-03-04 18:44) 933.3g/s 4033p/s 4033c/s 6050KC/s Et9tppppppppppppppppgSF)..4i^IppppppY8L>
15 30g 0:00:00:00 DONE (2015-03-04 18:44) 1000g/s 4033p/s 4033c/s 6050KC/s d9>'ppppppppppppppppppppppppppppppppppppppppppppppppppp..?-KnpppqKSn
29 30g 0:00:00:00 DONE (2015-03-04 18:44) 750.0g/s 3025p/s 3025c/s 4537KC/s  SO>pppppppppppWwrn..,DJoppppppppp*Z'N
11 31g 0:00:00:00 DONE (2015-03-04 18:44) 775.0g/s 3050p/s 3050c/s 4575KC/s %6#xppppppppppppppppppppppppppppppppppppppppppppppp;X#x..aP)*Xgz
26 26g 0:00:00:00 DONE (2015-03-04 18:44) 650.0g/s 3025p/s 3025c/s 4537KC/s oIwDppppppppppppppppppppppppppppppppppppppppppppppppppp..[]/Kppppppppppp$!n{
17 30g 0:00:00:00 DONE (2015-03-04 18:44) 750.0g/s 3025p/s 3025c/s 4537KC/s ..b,w0ppxH%0
10 31g 0:00:00:00 DONE (2015-03-04 18:44) 775.0g/s 3050p/s 3050c/s 4575KC/s H#Ixpppppppppppppppppppppppppppppppppppppppppphm3D..BY]9pppppppppp6a)B
Press 'q' or Ctrl-C to abort, almost any other key for status
25 28g 0:00:00:00 DONE (2015-03-04 18:44) 700.0g/s 3025p/s 3025c/s 4537KC/s }wP\p�{fR..Q He|^!'

Or, without --fork

920g 0:00:00:00 DONE (2015-03-04 18:45) 23000g/s 97150p/s 97150c/s 145725KC/s aab2b..ko}GpppVeLj

So, the "guesses: " lines (at least for recent bleeding-jumbo versions) appear to have been converted to match output of some older john versions.

frank-dittrich commented 9 years ago

I guess if you want to see the problem on your system without using --fork, you need to put all the passwords ending with ) at the end of pw-new.dic, and ignore those tests that process only a small part of pw-new.dic.

jfoug commented 9 years ago

But I wonder if you might skip lines that you wouldn't want to skip, because john's output doesn't contain "guesses: " lines anymore.

There is code that converts the new format back into the 'guesses: ' format right away. I really do not like the new format, it does not give nearly as much to search upon. It is only "#+g " (# meaning number) at head of a line. That is much less assured of being correct than a line starting with "guesses: " but that was solar's call I think.

frank-dittrich commented 9 years ago

So, we are lucky that pw.dic and pw-new.dic don't contain passwords starting with such a pattern.

jfoug commented 9 years ago

No luck involved. I made sure that was not there. It does not mean that JtR can not crack something like that,it simply means that the when I designed the input files, I knew better than to start something with 'guesses: ' string. Again, remember, the TS is nothing but a tool, designed to 'help' find cases where things are missed. That being the case, we do not have to look for 'certain' unique patterns. Known bad things simply do not have to be part of the tool. Just like removing -fork from the .pot re-run. The .pot rerun is not a 'normal' thing users are doing, it is just a method devised to make sure that the password written to the .pot file actually can re-crack the hash line. That the person running the TS wants to use -fork=x is not relevant at that point, so the easy thing was to strip it out from being used.

frank-dittrich commented 9 years ago

My comment wasn't about guesses:, but about passwords starting with 25g or 2 50g. But may be this wasn't pure luck, either.

Currently, we don't have such passwords. And when we have, we'll see TS errors because the adjusted password guesses: will not match the expected password.

jfoug commented 9 years ago

That happens is #g gets converted into 'gesses: #' type line. So what we would see is a lot more passwords cracked than should be (say there was a 143g and a 39g in the input file). If that was the case, we would always be 182 'passwords' too many. No, the pure number followed by a 'g' was not something that was specifically coded to not have. It was blind luck.

openwall / john-tests

Parsing error for status lines ending with ')' #45