Open magnumripper opened 5 years ago
Thanks, magnum. My thoughts on some of these:
The rule engine pre-dates the introduction of command-line --min-length
usable with all cracking modes. Yes, it makes sense to me to export this value to the rules engine via some pre-defined variable.
Implicit rule rejection when a command doesn't change the word would definitely have side-effects. Many rules currently depend on some of their commands being no-ops on a given input, as long as either not all are or the rule is in fact meant to be a no-op (yet try the input word). Adding a rule flag to explicitly enable this new behavior is a good idea. Implementing it in the existing rule interpreter would probably have slight performance impact, unless we duplicate the code (e.g., use a force-inline function and inline it twice with different values for the flag parameter, so that the compiler would customize the code - but I'm not sure this is worth the complexity).
I don't see why adding >8
to your example NT rule would speed it up. Isn't the Q
after the first T8
even better (tests the length and the character being a letter)?
I also thought of enhancing the rules optimizer, but I am not sure. For now, instead of unifying the single-character commands to string commands I optimized the single-character ones' implementations to special-case repeats of 2 or 3 such commands (they handle such repeats without falling out into the main commands loop; the performance difference is especially significant for prepends). In my testing, the resulting performance was on par or sometimes even slightly better than the string commands'. So we could even unify the other way around: short string commands to groups of 2 or 3 single character ones. I expect the string commands would reliably win for length 4+.
Another optimizer idea I had is to optimize into "bytecode" commands not intended for manual use - e.g., we could have some single-byte control code that represents roughly the equivalent of Az"
and is followed by another byte representing the string length. We could also have a number of single-byte opcodes for appending small string lengths (e.g., byte values 1 to 15 append strings of those lengths, byte values 16 to 30 prepend strings of length 1 to 15, byte value 31 is to be followed by another byte of string length and is for append; we could also use values in the range of 128 to 255, since we don't intend to ever have normal rule commands there). This would be an internal JtR layer, and subject to change (in case we find a better way to reclaim some of those byte values). A question is how to report those in john.log
(translate them back to user accessible commands?)
Yet another idea is to compile to a VM that will use computed goto's, like we do in compiler.c
when compiling with GCC. This removes the big switch
translation layer from command characters/bytes to command handler addresses. But then we have to provide fallback code for non-GCC.
I don't see why adding >8 to your example NT rule would speed it up. Isn't the Q after the first T8 even better (tests the length and the character being a letter)?
I thought so too, but it was definitely faster with >8
. We might want to investigate why for a starter...
I guess we should analyze the --rules:all
after PP and see what kind of enhanced dupe rejections would give most payback. Like sorting several consecutive commands or flags in eg. ascii order where order doesn't matter, and ignore trailing Q (where M is not used after some intermediate commands) for dupe rejection, eg >3 <5 r l Q
--> <5>3lrQ
. Ignoring Q for rejection would get tricky though considering that if one of the actual rules really had a Q and one didn't, we want to keep the one that did.
Perhaps an optimizer step that can look at a complete rule and figure out if/how it makes a net change to word length, and add rejections accordingly. Like [ ^@
--> <+ [ ^@
and [ A0"123"
-> ->3 <- [ A0"123"
.
I don't see why adding
>8
to your example NT rule would speed it up. Isn't theQ
after the firstT8
even better (tests the length and the character being a letter)?
Verified this again (this time on macOS):
[List.Rules:NT]
:
-c T0Q
-c T1QT[z0]
-c T2QT[z0]T[z1]
-c T3QT[z0]T[z1]T[z2]
-c T4QT[z0]T[z1]T[z2]T[z3]
-c T5QT[z0]T[z1]T[z2]T[z3]T[z4]
-c T6QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]
-c T7QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]
-c T8QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]
-c T9QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]
-c TAQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]
-c TBQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]
-c TCQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]T[zB]
-c TDQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]T[zB]T[zC]
[List.Rules:NTrr]
:
-c T0Q
-c ->2 T1QT[z0]
-c ->3 T2QT[z0]T[z1]
-c ->4 T3QT[z0]T[z1]T[z2]
-c ->5 T4QT[z0]T[z1]T[z2]T[z3]
-c ->6 T5QT[z0]T[z1]T[z2]T[z3]T[z4]
-c ->7 T6QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]
-c ->8 T7QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]
-c ->9 T8QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]
-c ->A T9QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]
-c ->B TAQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]
-c ->C TBQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]
-c ->D TCQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]T[zB]
-c ->E TDQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]T[zB]T[zC]
[List.Rules:NTrrwr]
:
-c T0Q
-c ->2 >1 T1QT[z0]
-c ->3 >2 T2QT[z0]T[z1]
-c ->4 >3 T3QT[z0]T[z1]T[z2]
-c ->5 >4 T4QT[z0]T[z1]T[z2]T[z3]
-c ->6 >5 T5QT[z0]T[z1]T[z2]T[z3]T[z4]
-c ->7 >6 T6QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]
-c ->8 >7 T7QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]
-c ->9 >8 T8QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]
-c ->A >9 T9QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]
-c ->B >A TAQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]
-c ->C >B TBQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]
-c ->D >C TCQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]T[zB]
-c ->E >D TDQT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]T[z8]T[z9]T[zA]T[zB]T[zC]
$ rm -f ../run/john.pot && ../run/john scraped.sam -form:nt -w:r30k.lst -v:1 -max-len=8 -ru:nt
Using default input encoding: UTF-8
Loaded 5294 password hashes with no different salts (NT [MD4 256/256 AVX2 8x3])
Press 'q' or Ctrl-C to abort, almost any other key for status
331g 0:00:00:14 DONE (2018-11-25 15:27) 22.42g/s 144299p/s 144299c/s 755484KC/s RENTERIA..KIKUMARU
Session completed
$ rm -f ../run/john.pot && ../run/john scraped.sam -form:nt -w:r30k.lst -v:1 -max-len=8 -ru:ntrr
Using default input encoding: UTF-8
Loaded 5294 password hashes with no different salts (NT [MD4 256/256 AVX2 8x3])
Press 'q' or Ctrl-C to abort, almost any other key for status
331g 0:00:00:14 DONE (2018-11-25 15:28) 22.45g/s 144495p/s 144495c/s 755484KC/s RENTERIA..KIKUMARU
Session completed
$ rm -f ../run/john.pot && ../run/john scraped.sam -form:nt -w:r30k.lst -v:1 -max-len=8 -ru:ntrrwr
Using default input encoding: UTF-8
Loaded 5294 password hashes with no different salts (NT [MD4 256/256 AVX2 8x3])
Press 'q' or Ctrl-C to abort, almost any other key for status
331g 0:00:00:09 DONE (2018-11-25 15:28) 36.21g/s 233026p/s 233026c/s 1175MC/s RENTERIA..KIKUMARU
Session completed
$ perl <r30k.lst -ne 'print unless length()<9'|wc -l
9948
$ perl <r30k.lst -ne 'print unless length()>=9'|wc -l
20052
Here, adding a rule reject flag made no difference (I think it should) and adding a word reject rule did (it shouldn't). I think my earlier tests under Linux using slightly different (and more) test data I saw some gain from the rule reject flag as well. Also, in my earlier tests only the wall clock time got much better while the p/s figure appeared worse (which is expected).
I don't see why adding
>8
to your example NT rule would speed it up. Isn't theQ
after the firstT8
even better (tests the length and the character being a letter)?
I looked at this more closely and it's pretty obvious really: At any length, the >8
is a simple integer compare (word length was already calculated anyway), while T8Q
will re-scan the full word (all of it, even if way longer than 8). The latter is obviously slower and to add insult to injury this difference in performance is multiplied by 256 at length 8 in the NT rule (and doubling for each length).
However, this did not explain the fact ->N
did not gain anything (it really should!) and upon closer look it turned out the --max-len=N
option did not trigger them - the format's original max. length was still considered. To my defense I'm pretty sure it did work long ago but the functionality has evolved over time and at some point where we got better separation between format's length, requested length and FMT_TRUNC
, some glue got lost. I have now fixed this and the result is ->N
makes a huge difference (26x boost in my test case) and while >N
still makes for another 5-6% boost it became less significant. The fix is in 821b46f3 and a later commit adds both length rejection types to the NT ruleset.
#
).Another idea: If we process the rule reject flags once before the preprocessor (any ones not involving it) and once right after it, we may reduce memory use and gain some performance - especially for cases where we restart rules (like in -pipe mode). None of the current rejection flags' conditions may change during a run.
"to add insult to injury this difference in performance is multiplied by 256 at length 8 in the NT rule (and doubling for each length)." - I think this isn't the case.
"to add insult to injury this difference in performance is multiplied by 256 at length 8 in the NT rule (and doubling for each length)." - I think this isn't the case.
Please elaborate. As far as I understand, the -c T8QT[z0]T[z1]T[z2]T[z3]T[z4]T[z5]T[z6]T[z7]
expands to 256 rules so the slow T8Q
will be enumerated 256 times. At best, instruction and data caches help a little.
Oh, you're right. I was thinking of something else.
Right now I think the lowest hanging fruit is in adding rejection flags/commands where possible (especially in imported rules like "best64" or the korelogic stuff). I'm experimenting with that but I'm hitting some obstacles.
Let's say I want to add rejection commands to these contrived example rules:
r c Az"[a-z][a-z][a-z]"
r c ] ] ] ]
I see no way to do so with current rules engine except after the commands that adds/removes characters, which is slow, eg.
r Az"[a-z][a-z][a-z]" <+ >@ c
r ] ] ] ] <+ >@ c
Or is there some way to do it before the slow stuff? Let's disregard the fact we could hand optimize them to A0"[a-z][a-z][a-z]" <+ >@ r c
and [ [ [ [ <+ >@ r c
for a tad earlier rejection - that is tedious and/or not generally possible.
vVNM
could be used if it wasn't for the fact we might go negative. It's best illustrated for the rule that removes characters:
va@4 >a va04 vb*a <b r c ] ] ] ]
We set a
to (min_length - 1) - 4
and do >a
. But with a min_length
of less than 5, we're going negative and the a
is parsed by >a
as unsigned so something 250-ish - ie. it will reject everything.
On a related note, the current implementation of @
(for min_length - 1
) sets it to 0 if min_length is 0 - that is suboptimal already. Could we not have all commands recognize signed arithmetic? That is, >N
should not reject anything if N is negative, <N
should reject everything for a negative and so on.
Even with that, we'd fail to early-reject in some cases though. For a while I thought we could want some new rejection commands similar to <N
and >N
except they'd check an arbitrary variable instead of current length. However, adding something like the following should be more effective and would be much easier to use than all of the above:
aN reject unless word would pass min. & max. lengths after adding N characters
bN reject unless word would pass min. & max. lengths after removing N characters
With that we could simply do:
a3 r c Az"[a-z][a-z][a-z]"
b4 r c ] ] ] ]
...and why not:
:[ab]1 \p[$^] [a-z]
Not sure about using a
and b
but they seem to be free. I believe some free mnemonics are: abghjmnw !@#&"
Oh, and even with the aN
and bN
I believe we might benefit from having all commands recognize signed numbers.
@solardiz I need input... or I'll possibly make some crazy PR that you might end up rejecting...
@magnumripper What you suggest (two new commands, signed arithmetic) makes sense to me.
When you say <+ >@
closer to the end of a rule is slow you seem to imply the rejections would happen often, whereas I think for most hash types and input words they'd be very rare. Perhaps use of the min/max command-line options changes that - is this the case you're optimizing for? Also, having these commands closer to the end of a rule, if/when we have to, becomes redundant with the length check JtR itself will perform after fully applying the rule.
I'm concerned there would be performance impact from adding the >@
when there's no minimum specified. Is the plan to squeeze these commands out in the initial parsing step when there's no minimum, the same step that currently squeezes out explicit NOP commands (whitespace and :
)?
Please coordinate with the Hashcat project regarding the specific command characters.
Yes, much of it are optimizations for using --min/max-len
options (eg. after having brute-forced up to a certain length) or when attacking eg. WPAPSK formats which have an actual min_length of 8. Not to mention the new default single mode feature to decrease max_length (within reason) rather than decreasing KPC for a GPU.
Having the optimizer strip things out whenever possible make a lot of sense, I'll keep that in mind.
@jsteube I believe the aN
and bN
commands are not yet used. To recap:
aN reject unless word would pass min. & max. lengths after adding N characters
bN reject unless word would pass min. & max. lengths after removing N characters
Are you OK with us using them for this or do you have some better suggestion? aN
can be a mnemonic for "Add N early-reject" but I can't find a similar good letter for the "delete N early-reject" except that b
comes right after a
...
Also, I can't recall whether Hashcat has constants and/or variables at all, but we're about to switch them from something like uint8_t to int8_t per above.
For "b" I don't see any problem, but keep in mind that InsidePro was using "a" for Toggle cases. If you don't mind, feel free using it, it's not used in hashcat. Thanks for asking!
The rules engine is a bottleneck. Can we make it faster?
*
for max_length (and related+
/-
). Could we benefit from having similar variables for min_length? I'm thinking rule rejection, for a start.--rules:NT
case-toggler (especially when used with max length less than 14) can be 50-60% faster if we add ->N and >N to it, eg.could be
$1$2$3
toAz"123"
we not only gain performance but also help the dupe-rule suppression.TN
,DN
,xNM
,iNX
,oNX
,XNMI
and so on, when max-len is <N
(or max-len <I
forXNMI
)? Would there be side effects? Theoretically we might change something beyond max length and then truncate the beginning of the word. We could however add a mnemonic that either enables or disables (depending on that we'd use as default) such optimizations per line. Or the optimizer (rejector) could possibly be clever enough to make some safe decisions.See also #3467