Review default incremental charsets provided with John

openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs

https://www.openwall.com/john/

Other

9.98k stars 2.06k forks source link

Review default incremental charsets provided with John #5220

Open rbsec opened 1 year ago

rbsec commented 1 year ago

The ascii.chr file that's used by default for incremental mode hasn't been change since at least 2013 (apart from an accidental change that was subsequently reverted), and doesn't reflect many of the common patterns that are seen in many passwords on newer systems.

In my experience, the majority of systems now enforce some kind of password complexity rules (for better or worse), and the most common way that users adapt to these rules is usually by sticking a capital letter and a number into their password (usually at the start and end respectively). However, based on a sample of the first 4 million candidates, generated from ascii.chr, the vast majority of the generated candidates don't include any uppercase characters:

Character sets
loweralpha: 1769406 (45.97%)
loweralphanum: 1740805 (45.23%)
numeric: 311967 (8.11%)
loweralphaspecial: 11281 (0.29%)
loweralphaspecialnum: 10834 (0.28%)
specialnum: 1677 (0.04%)
mixedalphanum: 723 (0.02%)
upperalphanum: 639 (0.02%)
mixedalpha: 260 (0.01%)
mixedalphaspecial: 105 (0.0%)
upperalphaspecialnum: 61 (0.0%)
upperalpha: 27 (0.0%)
special: 23 (0.0%)
mixedalphaspecialnum: 21 (0.0%)

And the majority of the passwords are also less than 8 characters long, which is a common minimum length:

Password length (length ordered)
1 = 94 (0.0%)
2 = 343 (0.01%)
3 = 2324 (0.06%)
4 = 26429 (0.66%)
5 = 443440 (11.09%)
6 = 2450977 (61.27%)
7 = 823543 (20.59%)
8 = 250000 (6.25%)
9 = 2592 (0.06%)
10 = 256 (0.01%)

This means that the out-the-box incremental mode is very ineffective on any system that has password complexity.

As a test, I ran the default ascii.chr incremental modes for 30 minutes against a recent Active Directory dump containing ~56k unique hashes - it cracked 446 of them.

By comparison, generating a charset from just the ~1,100 mixed-alpha-numeric passwords in the NCSC Top 100K Common Passwords cracked 3,560 hashes in the same time. And a charset generated from ~20,000 previously broken NT hashes from other unrelated Active Directory domains cracked 7,001 hashes.

I'm not saying that 1,100 words from a common password list is good way to build a character set - but although crude it seemed to be actually pretty effective.

My usage of John is heavily focused on English-speaking enterprise environments (and often Active Directory), which have different password policies and patterns from other systems, so I appreciate that this is not representative of how other people are using it. Perhaps the current incremental charsets are effective for most users, and the way I use John just makes me a bit of an outlier. And if that's the case, I'm quite happy generating my own.

But I think that it's perhaps worth revisiting the charsets included with John and considering:

Would a mixed-alpha-numeric charset be a useful addition?
Is the current ascii.chr still a good default in 2022?
More broadly, should the charsets be reviewed and updated?

magnumripper commented 1 year ago

That's a reasonable request. The current charsets were made from the Rockyou dataset (with dupes) damn near as-is as that was the best set in the wild at the time.

Creating new charsets is easy as pie if we can come up with a good dataset to train from. LinkedIn is a bit newer but may be too old as well. How about the HIBP stuff? Any other suggestions?

Also, there's a caveat: If we replace the existing charsets, older jobs can't be resumed. There are ways to handle that though: For example, we can create the new charsets with different names (eg. ascii-2022) and change the defaults to use them. Resumed older sessions will continue to use the currently existing ones then, as they are pointed out in the session file.

solardiz commented 1 year ago

sticking a capital letter and a number into their password (usually at the start and end respectively)

FWIW, the usual "at the start and end respectively" would not satisfy our passwdqc. https://www.openwall.com/passwdqc/

This means that the out-the-box incremental mode is very ineffective on any system that has password complexity.

That depends on your reason to run it against such system's password hashes. A possible reason is to ensure the policy is in fact being enforced, and to detect cases where it is not. For that, you need not to skip/postpone testing of weaker passwords.

generating a charset from just the ~1,100 mixed-alpha-numeric passwords in the NCSC Top 100K Common Passwords cracked 3,560 hashes in the same time.

That "top 100k" has a comment saying it corresponds to HIBP (not specifying which version). Those passwords are only moderately different from RockYou, which our charset files were generated from. So perhaps by applying the same filter to RockYou before generating your charset file, you'll achieve similar or better results (can be better because the full of RockYou is longer than 100k).

Overall, there's definitely room for improvement here, but I'm not sure exactly what we should do. For my own runs, I am using a charset file generated from (IIRC) 30x RockYou + HIBP v7 (giving higher weight to passwords that are in RockYou). This does perform moderately better than our released files in my testing - but with differences on the order of +10%, not many times like you had for the policy-enforced passwords.

I was also thinking of (and experimented with) possibly making improvements to the code in JtR before releasing new charset files, but we don't really have to.

rbsec commented 1 year ago

@solardiz I was actually halfway through writing a response when you posted, so that was good timing. Completely agree that the capital start/number end isn't a good way to make passwords - but it meets the Windows password complexity rules (8 chars, and at least 3 out of uppercase/lowercase/number/symbol/Unicode), and seems to be the most common way that people do that in my experience. I also take your point about the current incremental mode sometimes being useful to identify passwords that aren't following the policies.

HIBP is probably the biggest (public) dataset, although it's rather biased towards web applications. And of course there's an inherent bias in it that it comes from sites that have been compromised and generally using weaker hashing algorithms (if they hash at all). But then I suppose that hashes from public-facing web applications with weaker security is probably one of the main things that John is used for.

Testing charsets built from HIBP is a little tricky though, because obviously you can't test them against the HIBP hashes, and most existing public dumps are likely to have been incorporated into HIBP already. So I'm not really sure what the best approach would be, other than trying them out on newer dumps as they emerge and seeing if they perform well. Or people testing them privately and providing feedback.

The types of passwords in these dumps are so different to the ones I usually see in AD, so testing against those doesn't seem very useful.

There might also be significant differences depending on how many hashes you take from HIBP (1 million was just an arbitrary number), and how many of those have been cracked when you make the charset.

As a test, I generated a charset from (most of) the top 1 million hashes from HIBP v8, and ran it against the same Active Directory dump as above. The ascii.chr equivalent version (--external=filter_ascii) cracked 462 (compared to 446 with the default ascii.chr - which is a marginal improvement like you saw, but not very significant.

Using --external=policy to generate a mixed-alpha-numeric charset from them cracked 7,446, which is a big improvement on the 3,560 from a charset based on the same filter and the NCSC top 100k passwords. However, when I generated an --external=policy charset from rockyou, that cracked 8,775 hashes, which is even better.

I think a lot of that probably comes down to the number of matching passwords: rockyou had ~380k passwords that match the --external=policy, which the HIBP top 1 million only had ~18k. But since these numbers are based on a single AD domain of ~56k hashes - so I don't want to draw too many conclusions from them. I'll certainly try it out on the next couple of domains I'm testing though.

In my case, rockyou actually seems better than my sample from HIBP with --external=policy. But, if you're seeing a consistent 10% improvement with a charset based on the combination of the two, then perhaps that's a good starting point. It'd be interesting to know what other people are doing with charsets and if anyone has found something much more effective - but since it depends so much on what kind of passwords you're trying to crack it might be hard to get anything concrete.

But I think the biggest thing is that creating a new incremental mode based in the --external=policy filter seems to be very effective for systems that enforce complexity - so it might be something that's worth considering including by default. And if it's a new mode, that avoids the issue of breaking existing sessions.

It might even be nice if that incremental mode was the default for NT hashes (I know that LM has its own default filter) - but that's probably adding a lot of complexity and opening a whole can of worms about the default modes for every hash type, so perhaps better not to go there. And it's easy enough to just tell people to use --incremental=policy (or whatever it is)

solardiz commented 1 year ago

Regarding Windows password policies, maybe you can get some of your client companies to deploy our passwdqc? It can be used to implement a similar policy, but without the obvious bypass.

Yes, out-of-sample testing of charsets generated from HIBP is tricky. However, it is possible to deliberately exclude a small random portion of HIBP from the training set, to use it as a test set. It is also possible e.g. to generate using v7 and test against what's new in v8. In the latter case, however, such testing results would be biased to more complex passwords, which might not match real-world use cases.

1 million from HIBP may perform worse than RockYou simply because it's smaller. In my case, it was ~450M of not-too-difficult-to-crack HIBP v7 mixed with a similar number of repeats from RockYou (also favoring 3+ hits, which is a ~1.1M sub-list).

Sure, we could provide a pregenerated policy.chr, using --external=policy. We'd need to clarify somewhere that the corresponding incremental mode would not actually enforce a policy, but would merely prioritize candidate passwords differently.

solardiz commented 1 year ago

End of thread on john-users, where I described my experiments last year: https://www.openwall.com/lists/john-users/2021/05/05/3 Edit: and here's start of sub-thread showing a ~10% improvement: https://www.openwall.com/lists/john-users/2021/05/02/1

rbsec commented 1 year ago

Regarding Windows password policies, maybe you can get some of your client companies to deploy our passwdqc? It can be used to implement a similar policy, but without the obvious bypass.

I'm seeing more clients using the Azure AD Password Protection (although I think that's more blacklist based than complexity), but it's certainly one that I'll bear in mind. People tend to be pretty twitchy about installing things on their Domain Controllers though, so might be a tough sell.

Sure, we could provide a pregenerated policy.chr, using --external=policy. We'd need to clarify somewhere that the corresponding incremental mode would not actually enforce a policy, but would merely prioritize candidate passwords differently.

I wonder if naming it something like "mixedalnum" would avoid the issue (and be a little more consistent with the current naming)? It also makes it a bit more obvious what it does, because unless you know the external modes then it's not immediately obviously what --incremental=policy would do.

That thread was very interesting reading, and I hadn't really thought about trying to weight the different sources when generating the charsets (although I suppose you get this automatically when you give john a hash file to generate from). The point about using it in conjunction with wordlists is also something that I'd been thinking about: incremental mode is usually fairly late in my cracking process, so having it generate a load of candidates that I've already tried with wordlists+rules has limited value. Not really sure if much can be done about that, but comparing the post-rockyou performance does seem like a useful metric when evaluating different charsets.

solardiz commented 1 year ago

BTW, something we haven't yet tried is generating from our new password.lst, which is overlap of top HIBP v8 with RockYou. Maybe you try that in the same way you tried HIBP top 1 million and RockYou separately?

Azure AD Password Protection (although I think that's more blacklist based than complexity)

BTW, passwdqc 2.0+ can do both deny list and complexity (or either, depending on configuration), checking e.g. against full HIBP locally (and really quickly, and not needing too much disk space).

I wonder if naming it something like "mixedalnum" would avoid the issue (and be a little more consistent with the current naming)? It also makes it a bit more obvious what it does, because unless you know the external modes then it's not immediately obviously what --incremental=policy would do.

"mixedalnum" wouldn't be any clearer to me, but maybe.

rbsec commented 1 year ago

I just tried out charsets based on the new password.lst. Results are summarised in the tables below.

To recap, I've used three main charsets.

Rockyou = Default charsets included with John (or generated from rockyou.txt for the policy versions).
HIPB = Top 1 million hashes from HIBPv8 (~95% cracked).
New = New password.txt included with John (based on rockyou.txt and top HIBPv8)
Combined = Combination of the above three (no deduplication)

Results for the ascii charsets on two different Active Directory dumps:

Charset	AD 1 (~56k unique)	AD 2 (~45k unique)
Rockyou	446	19
HIBP	462	23
New	493	24
Combined	427	13

The first AD dump shows a clear win for the new (password.txt) based charset, which cracks ~10% more hashes.

The second AD dump has very few accounts that don't meet the Windows password complexity rules, so almost all of the cracked hashes are in the "lowercase-numbers-symbol" format. The numbers for it are so low that it's probably not a very useful datapoint.

And for the policy versions of the charsets, the results were:

Charset	AD 1 (~56k unique)	AD 2 (~45k unique)
Rockyou	8775	3627
HIBP	7446	3382
New	7763	3281
Combined	6668	3312

For both datasets, the rockyou.txt based charset was the clear winner (18% better than HIBP in the first test, and 7% better in the second). I suspect that this might be down to the much larger number of words matching policy in it (380k), compared to in HIBP (18k) and the new password.txt (30k). Interestingly, the combined charset didn't perform that well, especially on the first AD.

solardiz commented 1 year ago

@rbsec Thanks. You can also try generating from more (many million) of HIBP, or e.g. from https://github.com/rarecoil/hashes.org-list

solardiz commented 1 year ago

Combined = Combination of the above three (no deduplication) Interestingly, the combined charset didn't perform that well

Maybe it would perform better for you with deduplication.

rbsec commented 1 year ago

Right, so I've gone away and done some more (and slightly more rigorous) testing. Incremental mode was run for 50G candidate (~20 mins) with the following charsets:

Rockyou = Default charsets included with John (or generated from rockyou.txt for the policy and "complex" versions).
HIPB 1M = Top 1 million hashes from HIBPv8 (~95% cracked, so ~950k passwords).
HIPB 10M = Top 10 million hashes from HIBPv8 (~85% cracked, so ~8.5m passwords).
HIPB 50M = Top 50 million hashes from HIBPv8 (~70% cracked, so ~35m passwords).
HIPB 100M = Top 100 million hashes from HIBPv8 (~70% cracked, so ~70m passwords).
Hashesorg = The ~10GB Hashes.org list.
New = New password.lst included with John (based on rockyou.txt and top HIBPv8)
Combined = Combination of rockyou.txt, password.lst and HIBP 10M (deduplicated)

Combining anything with the biggest lists (HIBP50, HIBP100M and Hashesorg) didn't achieve much, because those list are so much bigger than what they're combined with that the results are pretty much the same as them on their own.

ASCII Charsets

Results for the --external=filter_ascii versions of the charsets on two different Active Directory dumps:

Charset	AD 1 (~56k unique)	AD 2 (~45k unique)
Rockyou	454	28
HIBP 1M	471	17
HIBP 10M	505	19
HIBP 50M	483	21
HIBP 100M	457	22
Hashes.org	360	26
New	498	24
Combined	469	25

None of the ascii versions were a very good fit, as most of the passwords in the dumps met Windows complexity requirements. The HIBP 10M performed best on one dump; but since none of the charsets were very good here it's clearly not a good data set to draw conclusions from.

Policy Charsets

For the --external=policy versions of the charsets, the results were:

Charset	AD 1 (~56k unique)	AD 2 (~45k unique)
Rockyou	8199	3896
HIBP 1M	7545	3605
HIBP 10M	9361	4362
HIBP 50M	9830	4467
HIBP 100M	9588	4405
Hashes.org	4495	2067
New	7852	3542
Combined	9020	4233

The Hashes.org charset performed pretty badly, and the size of the dump made it a bit awkward to deal with (generating the charset took a long time and John got up to about 20GB of RAM doing so).

The HIBP 50M was the best here, cracking ~20% more than the rockyou.txt equivalent, and was 18x more effective than the ascii equivalent on the first dump, and 200x more effective on the second.

It's worth noting that I only had ~70% of the HIBP top 50m hashes cracked, and after the --external=policy there were ~1.5m passwords used passwords users to generate it - so if someone has a fully cracked version then they may get different results for this.

Complex Charsets

I also tried making some "complex" versions of the charsets that were (roughly) aligned to Windows password complexity. Effectively extending the policy to require uppercase letters, lowercase letters and either numbers of symbols (although actually done with grep):

Charset	AD 1 (~56k unique)	AD 2 (~45k unique)
Rockyou	7952	4300
HIBP 1M	7418	3766
HIBP 10M	9470	4879
HIBP 50M	9929	5087
HIBP 100M	9793	5078
Hashes.org	8228	4455
New	7814	3795
Combined	8967	4455

The results for this were a bit of a mixed bag - some performed better than the policy versions and some performed worse. Where they did better, it was usually in the larger input sets (as a huge number of plaintexts were filtered out by the complexity requirements.

Thoughts

The best results for both the policy and complex charsets was the HIBP 50M, although results may be improved by cracking more of this than the ~70% that I did. The actual sweet spot could be anywhere between the 10M and 100M version that I tried - but I don't think there's much value in trying to finesse this much with only two data points.

I think there's a good argument for including some kind of complex/policy charset with John, as it's a significant improvement over the current ones for this kind of use-case. The HIBP 50M Complex set was the best from my testing - but it may be better to try and align to the existing policy external to make generation easier (or perhaps adding an updated version of that which includes ASCII symbols?).

It would be good to have some more testing of the ascii version of this charset - but it's not a very good fit for the kind of hashes that I normally crack. However, other people may have more success with it.

I've attached the HIBP 50M charsets I generated to this post - happy to share any others that people want to do their own testing on.

hibp50mcharsets.zip

solardiz commented 1 year ago

These are interesting results, @rbsec!

I just realized that our default --external=policy includes a check for the length being exactly 8, just as an example. Did you keep that check in your tests? If so, the generated candidates would be mostly of length 8, and those not of length 8 would not use any information of relative positions of characters.

This also makes me wonder what results you'd achieve by simply locking our default ascii.chr to --length=8.

For actual charsets we might generate for distribution with JtR, we should of course exclude the length 8 check from whatever filter we'd use.

Combining anything with the biggest lists (HIBP50, HIBP100M and Hashesorg) didn't achieve much, because those list are so much bigger than what they're combined with that the results are pretty much the same as them on their own.

A workaround is to combine with many repeats of the smaller list(s). This basically gives higher weights to patterns seen in the smaller lists, but then uses the larger list to provide fallback patterns when a specific combination of preceding characters is not seen in the smaller lists.

The best results for both the policy and complex charsets was the HIBP 50M, although results may be improved by cracking more of this than the ~70% that I did.

I'm not sure this would improve the results - it could also hurt them - but could be worth trying. You can possibly "crack" many more of HIBP by using the hashes.org list as a wordlist.

I think there's a good argument for including some kind of complex/policy charset with John

Yes, it looks so.

try and align to the existing policy external to make generation easier

Yes, we should probably revise it to make it more than just an example.

solardiz commented 1 year ago

extending the policy to require uppercase letters, lowercase letters and either numbers of symbols (although actually done with grep):

You can try this - the same as your grep?

[List.External:PolicyMod]
int mask[0x100];

void init()
{
        int c;

        mask[0] = 0x100;
        c = 1;
        while (c < 0x100)
                mask[c++] = 4;

        c = 'a';
        while (c <= 'z')
                mask[c++] = 1;
        c = 'A';
        while (c <= 'Z')
                mask[c++] = 2;
}

void filter()
{
        int i, seen;

/* This loop ends when we see NUL (sets 0x100) */
        i = seen = 0;
        while ((seen |= mask[word[i++]]) < 0x100)
                continue;

/*
 * We should have seen at least one character of each type (which "add up"
 * to 7) and then a NUL (adds 0x100).
 */
        if (seen != 0x107)
                word = 0; // Does not conform to policy
}

rbsec commented 1 year ago

I just realized that our default --external=policy includes a check for the length being exactly 8, just as an example. Did you keep that check in your tests? If so, the generated candidates would be mostly of length 8, and those not of length 8 would not use any information of relative positions of characters.

Strangely the version of [List.External:Policy] in my john.conf didn't have that limit - it just rejected any candidates that were less than six characters:

/*
 * We should have seen at least one character of each type (which "add up"
 * to 7) and then a NUL (adds 0x100), but not any other characters (would
 * add 0x200).  The length must at least 6
 */
    if (seen != 0x107 || i < 6) 
        word = 0; // Does not conform to policy
}

The git history of the file seems to show that the 8 char limit has been there for years, so not 100% where this came from. Maybe I edited it years ago and forgot. I think limiting to 8 chars is probably bad for a charset, although increasing that minimum length to 8 might be good for the complex one - I'll give it a go and see if it makes any difference.

In terms of the filter itself, the one you posted doesn't quite match what I was doing with grep, because I wanted to include symbols as well. I modified my existing policy to require uppercase, lowercase and either a number of an ascii symbol (may not be very efficient):

[List.External:Policy]
int mask[0x100];

void init()
{
    int c;

    mask[0] = 0x100;
    c = 1; 
    while (c < 0x100)
        mask[c++] = 0x200;

    c = 'a'; 
    while (c <= 'z') 
        mask[c++] = 1; 
    c = 'A'; 
    while (c <= 'Z') 
        mask[c++] = 2; 
    c = ' '; 
    while (c <= '@') 
        mask[c++] = 4; 
    c = '['; 
    while (c <= '`') 
        mask[c++] = 4; 
    c = '{'; 
    while (c <= '~')                                                                                                                           
        mask[c++] = 4; 
}

void filter()
{
    int i, seen;

/*
 * This loop ends when we see NUL (sets 0x100) or a disallowed character
 * (sets 0x200).
 */
    i = -1; seen = 0; 
    while ((seen |= mask[word[++i]]) < 0x100)
        continue;

/*
 * We should have seen at least one character of each type (which "add up"
 * to 7) and then a NUL (adds 0x100), but not any other characters (would
 * add 0x200).  The length must at least 6
 */
    if (seen != 0x107 || i < 6) 
        word = 0; // Does not conform to policy
}

It's not quite matching the Windows policy (which would allow things like 123456a!), but it seems close enough.

I'm not sure this would improve the results - it could also hurt them - but could be worth trying. You can possibly "crack" many more of HIBP by using the hashes.org list as a wordlist.

That's a good shout - I'll try that and regenerate them and see if there's much of a difference.

rbsec commented 1 year ago

Using the Hashes.org passwords as a wordlist was pretty effective at cracking more of HIBP, and (on top of the existing pot) cracked between 96% and 99% depending on the size. This gave me the following pots to build charsets from (new Hashes.org ones with the "HO" suffix, previous ones included for comparison):

HIPB 1M = Top 1 million hashes from HIBPv8 (~95% cracked).
HIPB 1MHO = Top 1 million hashes from HIBPv8 (~99% cracked).
HIPB 10M = Top 10 million hashes from HIBPv8 (~85% cracked).
HIPB 10MHO = Top 10 million hashes from HIBPv8 (~97% cracked).
HIPB 50M = Top 50 million hashes from HIBPv8 (~70% cracked).
HIPB 50MHO = Top 50 million hashes from HIBPv8 (~97% cracked).
HIPB 100M = Top 100 million hashes from HIBPv8 (~70% cracked).
HIPB 100MHO = Top 100 million hashes from HIBPv8 (~96% cracked).

However, the results were not great. In every case the new charsets performed worse than the previous ones (on two AD hash dumps) - so although cracking this way meant that there were a lot more plaintexts, the charsets generated from them were less effective.

ASCII Charsets

Charset	AD 1 (~56k unique)	AD 2 (~45k unique)
HIBP 1M	471	17
HIBP 1MHO	456	17
HIBP 10M	505	19
HIBP 10MHO	486	16
HIBP 50M	483	21
HIBP 50MHO	453	19
HIBP 100M	457	22
HIBP 100MHO	419	21

Policy Charsets

This was same same policy as before - so required uppercase, lowercase and digits, and a minimum length of 6:

Charset	AD 1 (~56k unique)	AD 2 (~45k unique)
HIBP 1M	7545	3605
HIBP 1MHO	6846	3352
HIBP 10M	9361	4362
HIBP 10MHO	8853	4091
HIBP 50M	9830	4467
HIBP 50MHO	9181	4234
HIBP 100M	9588	4405
HIBP 100MHO	9043	4190

Complex Charsets

This was the complex policy above - so required uppercase, lowercase and either numbers of ASCII symbols:

Charset	AD 1 (~56k unique)	AD 2 (~45k unique)
HIBP 1M	7418	3766
HIBP 1MHO	6762	3484
HIBP 10M	9470	4879
HIBP 10MHO	8955	4534
HIBP 50M	9929	5087
HIBP 50MHO	9293	4718
HIBP 100M	9793	5078
HIBP 100MHO	9195	4724

solardiz commented 1 year ago

increasing that minimum length to 8 might be good for the complex one - I'll give it a go and see if it makes any difference.

I don't recommend that - I think possible length limits belong to usage of a charset, not to its generation.

In terms of the filter itself, the one you posted doesn't quite match what I was doing with grep, because I wanted to include symbols as well.

You probably misread my code. What it does is actually very similar to what yours does, just in a simpler way. The difference is yours rejects passwords with non-ASCII characters in them, whereas mine treats non-ASCII the same as digits or symbols. As to handling of digits and symbols, our filters are the same.

It's not quite matching the Windows policy (which would allow things like 123456a!), but it seems close enough.

We can write a filter that would match that policy more closely. We can do 3 of 4 for ASCII fairly easily. We can also try 3 of 5 treating any non-ASCII as a 5th category, although that isn't the same as what the Windows policy description says (theirs is far trickier, maybe @magnumripper would want to help there).

The password contains characters from three of the following categories:

    Uppercase letters of European languages (A through Z, with diacritic marks, Greek and Cyrillic characters)
    Lowercase letters of European languages (a through z, sharp-s, with diacritic marks, Greek and Cyrillic characters)
    Base 10 digits (0 through 9)
    Non-alphanumeric characters (special characters): (~!@#$%^&*_-+=`|\(){}[]:;"'<>,.?/) Currency symbols such as the Euro or British Pound aren't counted as special characters for this policy setting.
    Any Unicode character that's categorized as an alphabetic character but isn't uppercase or lowercase. This group includes Unicode characters from Asian languages.

although cracking this way meant that there were a lot more plaintexts, the charsets generated from them were less effective.

I'm not surprised, but this was worth trying. This also means that your previous 70% wasn't necessarily optimal - maybe the threshold is different. We could also try giving greater weight (more repeats) to passwords that were easier to crack.

solardiz commented 1 year ago

would allow things like 123456a!

These two do:

[List.External:Policy3of4]
int mask[0x100];

void init()
{
        int c;

        mask[0] = 0x100; // NUL
        c = 1;
        while (c < 0x80)
                mask[c++] = 8; // Special (overridden below for alpha-numeric)
        while (c < 0x100)
                mask[c++] = 0x200; // 8-bit is disallowed

        c = 'a';
        while (c <= 'z')
                mask[c++] = 1; // Lowercase
        c = 'A';
        while (c <= 'Z')
                mask[c++] = 2; // Uppercase
        c = '0';
        while (c <= '9')
                mask[c++] = 4; // Digits
}

void filter()
{
        int i, seen, classes;

/*
 * This loop ends when we see NUL (sets 0x100) or a disallowed character
 * (sets 0x200).
 */
        i = seen = classes = 0;
        while ((seen |= mask[word[i++]]) < 0x100)
                continue;

        if (seen < 0x200) { // No disallowed characters
                while (seen &= seen - 1) // Count character classes
                        classes++;
        }

/*
 * We should have seen at least one character of at least 3 of the 4 allowed
 * classes, but not any disallowed characters.
 */
        if (classes < 3)
                word = 0; // Does not conform to policy
}

[List.External:Policy3of5]
int mask[0x100];

void init()
{
        int c;

        mask[0] = 0x100; // NUL
        c = 1;
        while (c < 0x80)
                mask[c++] = 8; // Special (overridden below for alpha-numeric)
        while (c < 0x100)
                mask[c++] = 0x10; // 8-bit

        c = 'a';
        while (c <= 'z')
                mask[c++] = 1; // Lowercase
        c = 'A';
        while (c <= 'Z')
                mask[c++] = 2; // Uppercase
        c = '0';
        while (c <= '9')
                mask[c++] = 4; // Digits
}

void filter()
{
        int i, seen, classes;

// This loop ends when we see NUL (sets 0x100)
        i = seen = classes = 0;
        while ((seen |= mask[word[i++]]) < 0x100)
                continue;

        while (seen &= seen - 1) // Count character classes
                classes++;

// We should have seen at least one character of at least 3 of the 5 classes
        if (classes < 3)
                word = 0; // Does not conform to policy
}

I think we should actually replace the current Policy example with them. Note that I've dropped the length check example here - I think it's not needed because we have length limit options on the command line, and they can be used together with an external mode when desired.

rbsec commented 1 year ago

You probably misread my code. What it does is actually very similar to what yours does, just in a simpler way. The difference is yours rejects passwords with non-ASCII characters in them, whereas mine treats non-ASCII the same as digits or symbols. As to handling of digits and symbols, our filters are the same.

You're right, that's my mistake - apologies.

I'll give the 3of4 and 3of5 externals a test and see if there's much difference between them - but given that at my hash dumps are from English-speaking places, I suspect that in reality these two externals will be doing the same thing (as very few people are using non-ASCII symbols in passwords). But the results might be quite different for hashes from countries that use other alphabets.

As you say, the Windows policy is a lot more complicated, but if we have a fast external that's 99% correct then that's still a very useful thing to have. There may be some value to having a more accurate one, but if it comes at the cost of adding a lot of complexity (and performance?) then it's perhaps not a priority.

I think we should actually replace the current Policy example with them. Note that I've dropped the length check example here - I think it's not needed because we have length limit options on the command line, and they can be used together with an external mode when desired.

I'd agree with this, and that the length restriction can be stripped out (assuming there's not a significant performance hit from doing so). I don't really have a strong opinion of which one is better - because in practical terms they're largely identical for how I'd use them.

I'm not surprised, but this was worth trying. This also means that your previous 70% wasn't necessarily optimal - maybe the threshold is different. We could also try giving greater weight (more repeats) to passwords that were easier to crack.

I'm sure it's not optimal, especially as the 50m and the 70% are both pretty arbitrary. But I don't think that there's much value in trying lots of variants to get a more optimal one for the two hash dumps I'm testing on, because it would likely not be optimal for other cases. But without a large public dataset of "complex" hashes to test against, I'm not really sure how we can better test it.

The other thing I dislike about it is that it's not very reproducible - so the exact charsets generated will vary depending on exactly how that 70% is cracked. But unless new charsets are being frequently made and tested against each other I guess that's not such a problem.

solardiz commented 1 year ago

if we have a fast external that's 99% correct then that's still a very useful thing to have.

I agree. When using it for charset generation, we should keep in mind that the resulting incremental mode won't follow the same rules anyway - it will just favor such candidates over others to some extent. BTW, for that reason I am thinking of maybe calling these charset files and incremental modes favor3of4 or such (and describe in a comment that favor3of4 corresponds to policy3of4 used at generation time).

rbsec commented 1 year ago

They do say that naming things is one of the hardest problems in computing...

I can see the benefit of a name like favour3of4 as it makes it clear that it's not actually enforcing the external filter. But on the other hand, I'm not sure how clear it is to the user? It's pretty obvious what --incremental=digits or --incremental=alpha does, but I'm don't think favour3of4 is as clear.

In some ways something like --incremental=complex seems more obvious - but then perhaps it's also misleading as not all the generated passwords are actually complex..

As an aside, I was curious how close the charsets would be to the externals. From a quick test with the HIBP 50M policy charset, the match rate to --external=policy was:

Candidates	Percentage Matched
1M	96%
10M	95%
100M	94%
1G	93%

So although most candidates do match, there's still a pretty significant number that don't.

magnumripper commented 1 year ago

I think the 3of4 and 3of5 as posted are good enough: I could add support for "character classes" in external but it opens up the proverbial can of worms: Are we talking UTF-8 here, or some legacy codepage? Or even worse, some mix of them (which is probably the most common case). Then we're better off just treating any 8-bit as one category.