openwall / john-tests

Test Suite for John the Ripper
24 stars 15 forks source link

Do we need to add --restore tests? #56

Closed frank-dittrich closed 9 years ago

frank-dittrich commented 9 years ago

Idea: run a test with --max-run-time=1, restore repeatedly until finished, then check if all has been cracked as expected. I think we need just 2 tests (may be one saltless plus one fast salted format).

Or may be some more, for standalone --mask and for --wordlist [--rules] --mask.

(The --fork can be added as -passthru.)

magnumripper commented 9 years ago

Definitely. I thought I added an issue similar to this already but perhaps I didn't. This can be made with just one format (any), because we are testing a mode, not a format.

frank-dittrich commented 9 years ago

But may be we shouldn't use the fastes format, otherwise, --max-run-time=1 might complete the task without the need to restore, especially if you also use --fork. So, completing the task without the need to restore should cause a warning.

jfoug commented 9 years ago

I am building this now. I am using the pw-new.dic, with the 50 at tail appended a few times. I did it for md5-crypt. It seems to take long enough to have multiple runs. We may have to add a number to the -restore switch (it is just a boolean now), so that you can specify number of seconds per run. Right now, I do a 10s run. But there may be fast enough machines that 10s does not cause enough retries.

This will be a self contained mode, just like internal. It runs, and the function does not return, it exits the script returning proper error code.

jfoug commented 9 years ago

Can we do this test for core??? I am not sure. I am using --max-run-time which I am 99% sure is a jumbo feature. I am not sure that I could do this for core.

I guess we could thread this somehow (or even just using & to put process in background), and then signal it. Speaking of running that way, that might be a good 2nd test even for jumbo. john likely may behave differently with a signal restart vs a -max-run-time restart. I do not know the restore (or save session) code that well, so I am not 100% sure just what to test.

do you two have any thoughts here? I can probably get a POC done and checked in shortly. I guess once we have it working, we can expand upon it if we need to.

frank-dittrich commented 9 years ago

core doesn't have --max-run-time, but Iguess if jumbo passes this test we can safely assume core will pass as well. You could use ulimit -t 10 ./john ... (and ignore $? when john gets killed for exceeding ulimit), but I don't think it is necessary.

jfoug commented 9 years ago

$? already is 256 (1) when we hit the time limit. When john completes prior to time limit, return is 0. That is my 'current' check. I am also trying to get the ETA lines to output (as a how it is going), but perl is being a pisser for me.

jfoug commented 9 years ago

https://github.com/magnumripper/JohnTheRipper/issues/1859#issuecomment-152661870

For the wordlist + other modes, we need to make sure that the test is long enough, and with enough breaks, that multiple rules are tested, along with different stopping points within the word file. So we may want a longer wordlist and fewer rules, and a shorter wordlist with LOTS of rules.

Also, we may want to perform some tests using -fork and restore (with the above modes also)

magnumripper commented 9 years ago

We should probably use one of the slowest formats for this, as we need to crack every single generated candidate in order to know we didn't miss one at restore. Perhaps some PBKDF2 with 10000 iterations. We should also use at least 10 salts I guess (so we don't miss catching a bug where not all salts got tested after a stop-restore cycle). Or maybe 100 salts and fewer iterations. We could pick any custom number of iterations, targeting a convenient speed.

magnumripper commented 9 years ago

Isn't the current implementation producing 3886 candidates while only having 1994 hashes to crack? There's a significant risk we pass a test even though several candidates were missed.

magnumripper commented 9 years ago

Ah, no, it rejects input longer than 15 chars so we only have 1994 candidates too. At least for SIMD builds, it's fine.

jfoug commented 9 years ago

Added wordlist+rules. Using korelogic ^d^d^d rules. I did a shuf | head 4500, then created the wordlist with 38 byte passwords (3200 of them). Then salted-sha1 was used. Takes a couple minutes to run, and I have 10s runtimes. Should be a pretty valid test.

e313cc7

jfoug commented 9 years ago

Having issues running this on my VM (64 bit). Had to remove the 'extra' passwords added, and increase time that the exits happen at. Not sure why ....

Do we have a method that will exit john after X number of candidates are found?

magnumripper commented 9 years ago
$ ../run/john -w=pw-new.dic -rules=PrependNumNumNum -stdout=39 -v:1
7770000p 0:00:00:01 100.00% (2015-11-04 19:33) 6815Kp/s 999ko}GpppVeLj

You test 7,770,000 candidates against 3,200 hashes? If there are a few missed candidates at resume, how would you be sure they are among these 3,200?

jfoug commented 9 years ago

Well look at it this way. I have a random 'mix' of 3200 hashes created out of those 7M possible data. This is no different than any other run of the TS. There are candidates which are not part of the 'found' set (or actually ANY password attack). All that matters is when things are all said and done, that all 3200 were found.

jfoug commented 9 years ago

how would you list to 'test' this ? I guess we could do a super slow hash, with a 'packed' list of data.

jfoug commented 9 years ago

If you can come up with test data, and a set of john tests runs (well, just 1 "run', and then multiple -rest=xxx runs), I will certainly put it into the TS.

magnumripper commented 9 years ago

The only safe way to test it is to have a hash list that need all the generated candidates in order to fully crack. I believe the first (wordlist only) test is fine as-is but this last one has only a 1/2500 chance of catching a missed candidate. We have to be sure any one missed candidate is catched or this will be pretty much useless.

Maybe we could simply use the default wordlist: ../run/john -w -ru:appendNumNum - that would total about 35000 candidates. So we create salted-sha1 hashes for them, possibly with just eg. 1000 different salts (trying to keep the speed up). We can modulate speed with number of unique salts but in the end, every single candidate should crack exactly one hash. And we probably want to ensure it gets resumed at least 10 times so we may have to aim for ~15 seconds (if ran without --max-run-time) on a very fast system.

If that ends up fine, we can add an almost identical test for wordlist + mask: ../run/john -w -mask:?w?d?d, using the same hash list.

Note that all we need to test here is resume. We don't have to involve any over-length candidates or things like that.

I can experiment a little with it later unless you beat me to it. I'm currently working on truecrypt-opencl.

magnumripper commented 9 years ago

Do we have a method that will exit john after X number of candidates are found?

Using an external filter, we can abort after X number of candidates are tried. See [List.External:AutoAbort]

jfoug commented 9 years ago

I created a 35k test, and it was done in 20s (35k of salts). I then changed the rules to do this:

$[0-9]$[1-2] and ended up with 70k. This file took 80s, which is 'about' right for doing 8s max run times. It should take 10-12 restarts to get it all (depends upon how much 'replay' there is. Since it is only 20 rules, there may be a 'lot'.

frank-dittrich commented 9 years ago

I'd prefer a mask '?w?l?d' or '?w?l?s' over '?w?d?d', because it is more likely that the number of candidates generated per input word is not a multiple of max. keys per crypt (or vice versa).

magnumripper commented 9 years ago

Hmm the batch size of salted sha1 (avx) for 8 threads is 32K, that's too large for this little test. Maybe we should run it with OMP_NUM_THREADS=1 and --mkpc=4093 or some other prime number. That would mitigate @frank-dittrich's worries too.

frank-dittrich commented 9 years ago

Then you have to disallow --mkpc= for -passthru for this specific test, or replace 4093 with the value given on the command line.

jfoug commented 9 years ago

Using salted-sha1 seemed to be 'hard' to get it to work right all the time. It would get in a loop easily, where it would restart far enough back, that it would find no candidates.

So I made a wordlist of 20 'garbage' words, and used appendnumnum rule (100 rules). I then did a pass_gen creation of bitcoin (which has OMP scale of 1), and set the rounds really low (2000).

Now running on a 10s max time, I get about 3-4% gain each 10s loop (shows up using -v)

I think this is a better test. I will get it cleaned up, and checked in.

jfoug commented 9 years ago

--rules=appendNumNum and -mask=?w?d?d work fine with same input file. I will get things checked in. Wasn't mask-hybrid the problem recently? If so, can we make sure the TS fails against the broken version.

NOTE, it is safer to run in the -v mode, so that there is some screen interaction, and so you can watch to make sure the sessions do not get 'locked up'

WOOT WOOT. Shows the bug just fine.

$ ./jtrts.pl -restore
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-1562-ga7456a8+ OMP [cygwin 64-bit AVX-ac]
--------------------------------------------------------------------------------

John Jumbo build detected.
Running JTRTS in -restore mode (against a 2000 candidate bitcoin input file -rules=appendNumNum)
  ** NOTE: may take a couple minutes to run

Done with run

Results (should be 2000) : 2000

Running JTRTS in -restore mode (against a 2000 candidate bitcoin input file -mask=?w?d?d)
  ** NOTE: may take a couple minutes to run

Done with run

Results (should be 2000) : 1154
$ ./jtrts.pl -restore
-------------------------------------------------------------------------------
- JtR-TestSuite (jtrts). Version 1.13, Dec 21, 2014.  By, Jim Fougeron & others
- Testing:  John the Ripper 1.8.0.6-jumbo-1-1562-ga7456a8+ OMP [cygwin 64-bit AVX-ac]
--------------------------------------------------------------------------------

John Jumbo build detected.

Running JTRTS in -restore mode (against a 2000 candidate bitcoin input file )
  ** NOTE: may take a couple minutes to run
Done with run.  Results (should be 2000) : 2000     PASS

Running JTRTS in -restore mode (against a 2000 candidate bitcoin input file -rules=appendNumNum)
  ** NOTE: may take a couple minutes to run
Done with run.  Results (should be 2000) : 2000     PASS

Running JTRTS in -restore mode (against a 2000 candidate bitcoin input file -mask=?w?d?d)
  ** NOTE: may take a couple minutes to run
Done with run.  Results (should be 2000) : 1268     FAIL!

As we can see, the count for the mask resume fluctuates. That is exactly like we would expect.

jfoug commented 9 years ago

546139a

This now has 3 restore types> NOTE the wordlist+mask shows problems in JtR.

magnumripper commented 9 years ago

:+1: perfect!

jfoug commented 9 years ago

I have several other 'modes' done, I just have to make sure I have them right.

I have single, pure-mask, pure-rexgen and hybrid-rexgen 'done'. I only had to make one other dictionary (for single). In that dict, I kicked the iteration count up much higher for bitcoin. I did re-create the base dictionary file. The dictionary is now:

111110 ..... 111129 (20 numbers). And I use -rules=appendnumnum.

So it is easy to use pure mask, and pure rexgen, along with pure single will find them all (the prior dictionary had a ; and a space char in a couple words, and thus single was not as friendly).

Single mode is 2k hashes, with 177864 iterations (bitcoin). It is much slower, but I think the timing is about right. It might be a 'bit' slow, but that should not really hurt anything. This is just a testing tool. Once we get all the various tests in, and running fine, then we should not have to perform these checks very often. Probably would be GOOD to check before each real release.

magnumripper commented 9 years ago

I wouldn't be surprised if we found some more needs for fixes, minor or not. These new tests are very good Q.C.

jfoug commented 9 years ago

Nice thing, is that they are somewhat static. We get them fixed, and there will infrequelty be new test types added, and the amount of code change really is not that often.

magnumripper commented 9 years ago

Does -stoponerror work for this? When debugging I'm pretty sure I'll need to see exactly what was missed, and/or run the same tests manually.

jfoug commented 9 years ago

each test will abort if the proper results are not found.

NOTE, for now, use -v The timings are static, and may not be right for specific CPU and or -fork and or OMP settings -v lets you see things as they chip away, so you can make sure the thing is still finding records. If the resume goes backwards TOO far, then it will get stuck, not being able to work through a single buffer/block in the time specified. The bitcoin is pretty minimal. It is OMP_SCALE=1 so it deals with 'few' candidates.

jfoug commented 9 years ago

This is the 'outer' function:

sub doRestoreMode {
    if ($core_only == 1) {
        ScreenOut("John CORE build detected.\n The -max-run-time mode ONLY works for jumbo build of john.\n");
        exit 1;
    }
    `rm -f tst-*`;

    # grow the pw-new.dic file:
    my $cmd = "$JOHN_EXE -rules=appendNumNum --stdout --w=bitcoin_restart_rules_tst.dic > tst-pw-new.dic 2>/dev/null";
    $cmd = `$cmd`;
    doOneRestore("-w=tst-pw-new.dic", "bitcoin_restart_rules_tst.in", 2000, 20, "bitcoin", "");
    unlink("tst-pw-new.dic");

    # now test wordlist + rules.
    doOneRestore("-w=bitcoin_restart_rules_tst.dic", "bitcoin_restart_rules_tst.in", 2000, 20, "bitcoin", "-rules=appendNumNum");
    # now test wordlist + mask.
    doOneRestore("-w=bitcoin_restart_rules_tst.dic", "bitcoin_restart_rules_tst.in", 2000, 20, "bitcoin", "-mask=?w?d?d");
    # now test single mode.
    doOneRestore("", "bitcoin_restart_single_tst.in", 2000, 20, "bitcoin", "");
    # now test pure mode.
    doOneRestore("", "bitcoin_restart_rules_tst.in", 1000, 20, "bitcoin", "-mask=11111?d?d?d");
    doOneRestore("", "bitcoin_restart_rules_tst.in", 1000, 20, "bitcoin", "-mask=11112?d?d?d");
    # now test pure rexgen mode.
    doOneRestore("", "bitcoin_restart_rules_tst.in", 2000, 20, "bitcoin", "-rexgen=1111[1-2][0-9][0-9][0-9]");
    # now test wordlist + rexgen mode.
    doOneRestore("-w=bitcoin_restart_rules_tst.dic", "bitcoin_restart_rules_tst.in", 2000, 20, "bitcoin", "-rexgen=/0[0-9][0-9]");

    exit 0;
}

And here is the worker function that does each test.

sub doOneRestore {
    my ($dic, $hashes, $cnt, $runtime, $form, $exargs) = @_;

    ScreenOutSemi("\nRunning JTRTS in -restore mode (against a $cnt candidate $form input file $exargs)\n  ** NOTE: may take a couple minutes to run\n");
    my $cmd = "$JOHN_EXE -ses=tst- $pass_thru $exargs $dic $hashes -pot=tst-.pot -max-run=$runtime -form=$form 2>&1";
    ScreenOutV("Running 1st retore command, command line\n\n$cmd\n\n");
    my $results = `$cmd`;
    my $ret = $?;
    ScreenOutVV("Results of this run are: $results\n return code [".($ret>>8)."]\n\n");
    if ($verbosity > 2) { show_eta($results); }
    $cmd = "$JOHN_EXE -res=tst- 2>&1";
    while ( ($ret>>8) == 1) {
        ScreenOutV("Running continuing retore commands.  Cmd line: $cmd\n");
        $results = `$cmd`;
        $ret = $?;
        `stty echo >/dev/null 2>/dev/null`;
        ScreenOutVV("Results of this run are: $results\n return code [".($ret>>8)."]\n\n");
        if ($verbosity > 2) { show_eta($results); }
    }
    # now compute if we got them all.
    $results = `LC_ALL='C' sort tst-.pot | LC_ALL='C' uniq | LC_ALL='C' wc -l`;
    chomp $results;
    print ("Done with run.  Results (should be $cnt) : $results   ");
    cleanup();

    if ($results != $cnt) {
        print "FAIL!\n";
        exit 1;
    }
    if ($error_cnt+$error_cnt_pot+$ret_val_non_zero_cnt != 0) {
        print "Some error status.\n";
        exit 1;
    }
    print "PASS\n";
    `rm -f tst-*`;
}
magnumripper commented 9 years ago

s/retore/restore/g

jfoug commented 9 years ago

Damn nit picker ;) That was simply an easter-egg for -v mode, lol

magnumripper commented 9 years ago

wordlist and wordlist+rules passed just fine here. And hybrid mask failed. Will look into it. The fact we can separate the basewords from the digits should mean I can see pretty much in what way it fails.

magnumripper commented 9 years ago

Oh, but it deleted the pot file...

jfoug commented 9 years ago

I thopught that happened AFTER the exit. I will look.

ahh, there is a call to 'cleanup' in there.

magnumripper commented 9 years ago

I'll run it manually with a -max-run that does it all in just one resume. Should be pretty clear then.

jfoug commented 9 years ago

-single mode seems to work. -pure mask seems to work. (only first 1000 checked so far, however).

jfoug commented 9 years ago

I will have to let you get rexgen working. I no longer spend time to get that boat anchor code working. I will check in what I have. I think it is ready to roll.

jfoug commented 9 years ago

Please see if rexgen is setup right. NOTE, you will have to comment out the mask test, to be able to continue the testing (or we could move that test to the end, knowing it fails).

bd0ac12

jfoug commented 9 years ago

b7a79ed puts mask+wordlist to the last test.

magnumripper commented 9 years ago

Hmm the "single" test is actually running batch mode. We'll see how it goes. I assume it should explicitly run -single, right?

magnumripper commented 9 years ago

Ah, but -single is the first mode of batch mode. This will work fine except if it actually does miss anything. If it does, it will continue with wordlist and then incremental, and "never" end, lol.

magnumripper commented 9 years ago

Or are you limiting the number of resumes? Doesn't look so in the code above.

jfoug commented 9 years ago

No, there is no 'limit'. It runs until the return says jtr is done.

So in this case, the -single SHOULD be added. BUT if there are any missed, then it still is a problem, and will run a LONG time.

Fortunately, single 'seems' to resume properly.

magnumripper commented 9 years ago

Luckily single/batch did pass.

Not sure why you do two pure mask runs. It could be one if using -mask=1111[12]?d?d?d

And oops the rexgen test went into an infinite loop! I need to quit now, will look at this tomorrow.

magnumripper commented 9 years ago

Made some fixes. Two main problems: It's --regex and not --rexgen. Also, pure rexgen actually doesn't support resuming (it does warn about it) so that is commented out from TS now. But wordlist+rexgen should work (only saving parent mode's state). It doesn't work very good though - I think the problems is it doesn't have enough time to get ahead so it's stuck at the initial 10%. If that's it we could simply bump --max-run-time for that test to 20s but I'm ignoring this problem for now and will try to fix hybrid mask.

magnumripper commented 9 years ago

We're not looking for "capability rexgen" are we? We need to add that so we don't try to run regex mode when there is none.

frank-dittrich commented 9 years ago

You' need a special mode to test --regex, so that just a few combinations per candidate will be generated. Otherwise, you'll never proceed to the next block of input words. Plus, keep the run time slow. (For very fast hashes, --max-run-time just 1 or 2, otherwise, the memleak issues will cause trouble.