Closed frank-dittrich closed 9 years ago
Being at the end of the file is actually not a valid way to approach. They can still end up in a file, IF the length PLAINTEXT_LENGTH is short enough that it causes the whole input file to be used. I may want to simply remove them, but that may also require some regeneration of input files.
$ grep "[)]:$" dynamic_0_new_tst.in
u27:$dynamic_0$5347604b7de9a53775a7d81ae12419f6:27:0:Et9tppppppppppppppppgSF):
u32:$dynamic_0$dd6a0523fafb43575c5a6154f45e83b2:32:0:;d1=ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp($!):
u73:$dynamic_0$76eea47e17f197634ff0bfbf481801dc:73:0:;_TEpppppppppp~h-):
u196:$dynamic_0$1f8bebb48f98d680d9bd9941e9e2de81:196:0:yp/Hppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp\&_):
u227:$dynamic_0$76bcd3383aaf6da0d2b6523c82ef8436:227:0:0hFGpppppppppppppppppppppppppppppppppppppppppppv5C):
u356:$dynamic_0$4f905588949b88a3c5dc2f570b9357da:356:0:<(}appppppppppppppppppppppppppppppppppppppppppppppppA@<):
u548:$dynamic_0$c82d12e38709265b82d7c45166cf7e42:548:0:}~%^ppppppppppppppppppppppppppppppppppppppppppp|Au):
u607:$dynamic_0$137081ae6efae049f1ced24ea881e536:607:0:phQ.pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp946)
u886:$dynamic_0$e70ecf7587290d896207ff47f7e98ead:886:0:Ts(}p▒ppppp▒ppppp▒ppppp▒pppppppppppppppppppppppppppppppppppppppppppppppppp Z%):
u955:$dynamic_0$2bdc635611b4aa66956f11d624df1413:955:0:tFJ`ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppgzO):
u984:$dynamic_0$3d185db8a9920c811a6bec04957204c4:984:0:|-^xpppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppph]a):
u1169:$dynamic_0$1d1762d34b01ae3a961abfe43988f61b:1169:0:xN}%ppppppp>Su):
u1346:$dynamic_0$f63d2726f511da47f0c763ba995134b8:1346:0:1~p/pppppppzf/):
u1347:$dynamic_0$0165726de0c6f1405759dec7c22f93c9:1347:0:t6)FppppV'j):
u1445:$dynamic_0$06484ad45b6262cb94e9297eb92747e7:1445:0:?ziypppppG"Z):
Ok, there are numerous. I wonder what is triggering your runs to have these problems (other than garbled input). I think it is where the ')$' from a found password smashes one of the fork guess lines. So it may be in the guess detect logic, which I was also looking at simply removing, and reading the .pot file to find the count.
I only moved the passwords ending with )
to the end of the word list so that I can reliably trigger the parsing bug.
Otherwise, you needed to be lucky that one of the status lines (of the forked processes) had passwords
any_password..a_password_endingwith)
If these "problematic" passwords appear in a random sequence, then the more forked processes, the higher the chance one ends with such a status line.
(Of course, changing pw-new.dic without also adjusting some word list sizes in jtrts.dat, you might now get other FAILed tests. )
But we can not simply use a file like that Frank. There are formats where I use the first X lines of the file (some of the slower ones). I create a truncated input file. This was done so that when there ARE problems with the format and it is not finding things, that it does not run a LONG time, but exits much quicker with a listing that only part of them were found.
I did not seriously suggest we change pw-new.dic.
It was just a temporary workaround to be able to more reliably get those status lines which list passwords any_password..a_password_ending_with_)
, even without using --fork.
The problem is that you treat these status lines where the final password happens to end with a )
as if they were lines reporting cracked passwords.
Part of the 'parsing' problem is it is truly free form. There is very little to go on.
This could be a bad line:
guesses: 920 0:00:00:00 DONE (2015-03-04 16:52) 30666g/s 30666p/s 30666c/s 28213KC/s aab2b..?ziypppppG"Z)
as could this
E (2015-03-04 16:52) 30666g/s 30666p/s 30666c/s 28213KC/s aab2b..?ziypppppG"Z)
The format is simply this:
something (something)
so if the line ends with a ')' and has a '(' somewhere, then that is really all I have to go on to check. The code is written to FIND cases where there are problems, so checking for 'something' == 'something' only tells us we have a correct password.
This is perfectly expected (but buggy) for JtR to ouput: 1 (111) and we have seen that for bfegg. It shows a problem (well a problem in the actual hash algorithm, not really in JtR). So what makes parsing 1 (111) any better than parsing E (2015-03-04 16:52) 30666g/s 30666p/s 30666c/s 28213KC/s aab2b..?ziypppppG"Z)
? Yes, in the 2nd one, we do not have a matched pair of ( and ). I 'guess' I could look for that. But I will assure you, that is simply a band aide, and will not fix anything.
Ok, I see the problem you are explaining now.
The 'easy' work around for that is to add this to the pot_match_pass() function:
if (index($line, "Will run ") == 0 && index($line, "OpenMP") > 0) { return 1; }
if (index($line, "Node numbers ") == 0) { return 1; }
+ if (index($line, "guesses: ") == 0) { return 1; }
my $idx = index($line, " (");
my $s = substr($line, $idx+2);
I do not think this fixes the fork problems, but should fix the problem you are seeing. I think the problem you are seeing will vary based upon OMP_NUM_THREADS
What is the OMP thread count you are running on this machine?
BFegg is different, because there is more than one valid password per hash. May be it would be good enough to stop looking for more cracked passwords if you already found all your expected passwords.
Default OMP tread count here is 8 (4 physical cores plus HT)
Part of the problem I am having, is I do not see same results. I do not know if this is differences between mmap, or something else. But at OMP=8, all is fine (before the patch I am listing above). I am testing on other systems also.
Checking for "guesses" will only work if we never have a "guesses" password in one of our test file. Checking whether the part between the final '(' and ')' is a valid user name is probably overkill, but it could also help, as long as we don't use user names containing '(' .
It would have to be "guesses: " (with the colon and space), and be the first thing in the password, but yes, you are correct, that would not be a valid string to have in the input file.
Please try after this change: 1d43403
Ok, I am finally able to see a 'few' of these things:
./jtrts.pl -passthru="-fork=8" new -q
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-jumbo-1-bleeding_omp_memdbg_asan [linux-gnu 64-bit SSE4.1-autoconf]
--------------------------------------------------------------------------------
.pot CHK:dynamic_35 guesses: 1500 -show=1500 0:00:00:00 DONE : Expected count(s) (1500) [!!!FAILED4!!!] (1500 val-pwd 1 inval-pwd)
Some tests had Errors. Performed 113 tests. 1 errors reprocessing the .POT files
Time used was 234 seconds
jimfougeron@jimfougeron-vb-14-04:/JtR/bleed/test$ ./jtrts.pl -passthru="-fork=8" new -q
./jtrts.pl -passthru="-fork=8" new -q
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014. By, Jim Fougeron & others
- Testing: John the Ripper password cracker, version 1.8.0.2-jumbo-1-bleeding_omp_memdbg_asan [linux-gnu 64-bit SSE4.1-autoconf]
--------------------------------------------------------------------------------
.pot CHK:md5crypt guesses: 1500 -show=1500 0:00:00:00 DONE : Expected count(s) (1500) [!!!FAILED4!!!] (1500 val-pwd 1 inval-pwd)
Some tests had Errors. Performed 113 tests. 1 errors reprocessing the .POT files
Time used was 231 seconds
I will now try with the guesses and see if I can run this with none of these problems. BUT this is from using -fork. I am not able to do this simply with OMP running on my system.
Funny thing, I wasn't able to reproduce anymore.
But then I did a
git reset --hard 35f406863db54615e2cd26e0aa88b38ad00759a8
So, then I tested without your previous 3 commits commit 7e404b38227026d6fa88a22aff3cf093c02bbc7b Author: jfoug jfoug@cox.net Date: Wed Mar 4 08:57:28 2015 -0600
handle minor fix to .pot -fork remove
commit e7d0b46e5c2bf2667329b099e55025377a2fca49 Author: jfoug jfoug@cox.net Date: Wed Mar 4 08:55:42 2015 -0600
handle --fork also
commit fa18818172aa47dee865fce7fd43cb2f54a4391d Author: jfoug jfoug@cox.net Date: Wed Mar 4 08:53:08 2015 -0600
removed -fork from .pot recheck run
I verified that the problem was still there, then just applied the https://github.com/magnumripper/jtrTestSuite/commit/1d4340328a74d4693e77ca7f0e40b3dc7ebe6372 change and verified that this fix alone also solved the problem for me.
I guess it is a good idea to keep all these changes (avoid stdout mangling problems due to --fork for pot file re-processing, and skipping the "guesses: " lines.
But I wonder if you might skip lines that you wouldn't want to skip, because john's output doesn't contain "guesses: " lines anymore.
john prints:
20 28g 0:00:00:00 DONE (2015-03-04 18:44) 933.3g/s 4033p/s 4033c/s 6050KC/s 123..|R4a,+,F
32 26g 0:00:00:00 DONE (2015-03-04 18:44) 866.6g/s 4033p/s 4033c/s 6050KC/s af<xppppppppppppppppppppppppppppppppppppppppppppppppppp..sT3Tppp-E'@
18 31g 0:00:00:00 DONE (2015-03-04 18:44) 1033g/s 4033p/s 4033c/s 6050KC/s nGtZpeny\..[b/=I Z
19 27g 0:00:00:00 DONE (2015-03-04 18:44) 900.0g/s 4033p/s 4033c/s 6050KC/s p..2f@;ppppppppp-uYV
28 29g 0:00:00:00 DONE (2015-03-04 18:44) 966.6g/s 4033p/s 4033c/s 6050KC/s vOQZppppppppppppppppppppppppppppppppppppppppppppppppppp..6l<dpppppp,P`-
30 28g 0:00:00:00 DONE (2015-03-04 18:44) 933.3g/s 4033p/s 4033c/s 6050KC/s #1w!ppppppppppppppppppppppppppppppppppppppppppppppppppp..Y`!mpp-@0I
31 28g 0:00:00:00 DONE (2015-03-04 18:44) 933.3g/s 4033p/s 4033c/s 6050KC/s Et9tppppppppppppppppgSF)..4i^IppppppY8L>
15 30g 0:00:00:00 DONE (2015-03-04 18:44) 1000g/s 4033p/s 4033c/s 6050KC/s d9>'ppppppppppppppppppppppppppppppppppppppppppppppppppp..?-KnpppqKSn
29 30g 0:00:00:00 DONE (2015-03-04 18:44) 750.0g/s 3025p/s 3025c/s 4537KC/s SO>pppppppppppWwrn..,DJoppppppppp*Z'N
11 31g 0:00:00:00 DONE (2015-03-04 18:44) 775.0g/s 3050p/s 3050c/s 4575KC/s %6#xppppppppppppppppppppppppppppppppppppppppppppppp;X#x..aP)*Xgz
26 26g 0:00:00:00 DONE (2015-03-04 18:44) 650.0g/s 3025p/s 3025c/s 4537KC/s oIwDppppppppppppppppppppppppppppppppppppppppppppppppppp..[]/Kppppppppppp$!n{
17 30g 0:00:00:00 DONE (2015-03-04 18:44) 750.0g/s 3025p/s 3025c/s 4537KC/s ..b,w0ppxH%0
10 31g 0:00:00:00 DONE (2015-03-04 18:44) 775.0g/s 3050p/s 3050c/s 4575KC/s H#Ixpppppppppppppppppppppppppppppppppppppppppphm3D..BY]9pppppppppp6a)B
Press 'q' or Ctrl-C to abort, almost any other key for status
25 28g 0:00:00:00 DONE (2015-03-04 18:44) 700.0g/s 3025p/s 3025c/s 4537KC/s }wP\p�{fR..Q He|^!'
Or, without --fork
920g 0:00:00:00 DONE (2015-03-04 18:45) 23000g/s 97150p/s 97150c/s 145725KC/s aab2b..ko}GpppVeLj
So, the "guesses: " lines (at least for recent bleeding-jumbo versions) appear to have been converted to match output of some older john versions.
I guess if you want to see the problem on your system without using --fork, you need to put all the passwords ending with )
at the end of pw-new.dic, and ignore those tests that process only a small part of pw-new.dic.
But I wonder if you might skip lines that you wouldn't want to skip, because john's output doesn't contain "guesses: " lines anymore.
There is code that converts the new format back into the 'guesses: ' format right away. I really do not like the new format, it does not give nearly as much to search upon. It is only "#+g " (# meaning number) at head of a line. That is much less assured of being correct than a line starting with "guesses: " but that was solar's call I think.
So, we are lucky that pw.dic and pw-new.dic don't contain passwords starting with such a pattern.
No luck involved. I made sure that was not there. It does not mean that JtR can not crack something like that,it simply means that the when I designed the input files, I knew better than to start something with 'guesses: ' string. Again, remember, the TS is nothing but a tool, designed to 'help' find cases where things are missed. That being the case, we do not have to look for 'certain' unique patterns. Known bad things simply do not have to be part of the tool. Just like removing -fork from the .pot re-run. The .pot rerun is not a 'normal' thing users are doing, it is just a method devised to make sure that the password written to the .pot file actually can re-crack the hash line. That the person running the TS wants to use -fork=x is not relevant at that point, so the easy thing was to strip it out from being used.
My comment wasn't about guesses:
, but about passwords starting with 25g
or 2 50g
.
But may be this wasn't pure luck, either.
Currently, we don't have such passwords.
And when we have, we'll see TS errors because the adjusted password guesses:
will not match the expected password.
That happens is #g gets converted into 'gesses: #' type line. So what we would see is a lot more passwords cracked than should be (say there was a 143g and a 39g in the input file). If that was the case, we would always be 182 'passwords' too many. No, the pure number followed by a 'g' was not something that was specifically coded to not have. It was blind luck.
See also the last comments in https://github.com/magnumripper/JohnTheRipper/issues/1074. It is a real TS bug (parsing error, not related to mangled stdout output due to fork.)
Here is how to reproduce it without fork:
First make sure that all the passwords in pw-new.dic that end with a
)
character are at the end of that file:Then
Out of 920 passwords, there are 920 valid passwords + 1 invalid (due to the parsing error)