Add extremely common word sequences?

softwarecreations commented 3 years ago

TLDR: 123456 is pretty much the most common password in the world and also has no entropy due to being an obvious sequence. zxcvbn-ts falls on it's face with onetwothreefourfivesix, rating it as maximum strength. Let's fix that?

Just an idea, not sure if this is commonly done with passwords. But just like 123456789 or 987654321 or abcdefg, etc is seen as completely lacking entropy... what about

Months januaryfebruarymarch julyjunemay

Written numbers onetwothree nineeightseven

Seasons springsummerautumn winterspringsummer

Bible chapters genesisexoduswhatever etc

Sizes smallmediumlarge largemediumsmall

Greek whatever alphabeta etc

Phonetic alphabet alphabravocharliedelta tangosierraromeo

zxcvbn-ts currently thinks all this sort of junk is a strong password (might need to add an extra word in some cases, but normally 3-4 words, and it thinks you're golden), when you've basically got no entropy if you're using any of the above.

Obviously there's an endless amount of common sequences people could put into a password. Like listing the characters of a popular tv series.

But I figured the categories I wrote above should be standard, because regardless of a person's preferences or personality, they'll deal with (or be familiar with) most, if not all of the above. With the exception of maybe awareness of the bible chapter names.

MrWook commented 3 years ago

Hey, thanks for the suggestion. I like the idea, the problem is that this would be a combination from the dictionary matcher and the sequence matcher. Basically you need a dictionary for every language that has those sequences. For example like this:

{
  "numbers": [
    "one",
    "two",
    "three"
...
  ],
  "seasons": [
    "spring",
    "summer",
    "autumn",
    "winter"
  ]
...
}

This would mean you need to use the dictionary matcher to identify all those different words and then you need to use some kind of sequence matcher to go through all those matches to check if they are in a row.

I like the general idea of this but i don't see the solution right now. If you have an idea feel free to open a PR or create your own package, since 1.0.0-beta-0 custom matchers are possible but i think it would be easier to add it to the repo to reuse the dictionary and add a custom DictionarySequence matcher.

softwarecreations commented 3 years ago

Youre welcome. This is what immediately comes to mind. I haven't given it any deep thought so there may be dragons.

One idea that comes to mind, is if these words exist in an order less dictionary, they can be moved out into an array of sequence arrays. (As your example) (Then we aren't loading duplicate words into the browser and keeping the bundle small) Then the regular word matcher can be tweaked to look for words in the seuqnece arrays just like it looks for words on the order less array currently.

Then once thats done... Its a simple matter of writing an algorithm to look for words in the sequence arrays within the password. If a match is found, check the following word or previous word, if that matches check the next word (while loop) then remove that string from the password and mark it as score 1 or zero or whatever. Then repeat the process with whatever's left of the password.

It seems relatively trivial to add this functionality but of course time is precious and would take a little bit of time. Currently I have no free time. Just contributing the idea at this point. No pressure. Thanks for the library :)

On 22 July 2021 3:19:11 PM SAST, MrWook @.***> wrote:

Hey, thanks for the suggestion. I like the idea, the problem is that this would be a combination from the dictionary matcher and the sequence matcher. Basically you need a dictionary for every language that has those sequences. For example like this:
{
 "numbers": [
   "one",
   "two",
   "three"
...
 ],
 "seasons": [
   "spring",
   "summer",
   "autumn",
   "winter"
 ]
...
}
This would mean you need to use the dictionary matcher to identify all those different words and then you need to use some kind of sequence matcher to go through all those matches to check if they are in a row.

I like the general idea of this but i don't see the solution right now. If you have an idea feel free to open a PR or create your own package, since 1.0.0-beta-1 custom matchers are possible but i think it would be easier to add it to the repo to reuse the dictionary and add a custom DictionarySequence matcher.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/zxcvbn-ts/zxcvbn/issues/63#issuecomment-884905511

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

modest commented 3 years ago

I had a similar, broader idea:

Since Dropbox kicked off this project, there have been some public leaks of unhashed password lists that should be game-changing data sources for a project like this. Instead of assuming that passwords use common words in the same frequency as written text ("you, to, it, that, ..."), we can rank them based on their actual usage in passwords.

Based on actual leaked password lists, we can improve entropy scoring based on (1) the popularity of the password structure (set of patterns; e.g. (word)(number)(symbol) > (symbol)(word)(symbol)) and (2) the rank/weight of each particular pattern within those sets (e.g. onetwothreefour > correcthorsebatterystaple). That first exercise – determining the entropy of the password structure itself – was waived by the original project due to lack of data.

Of course, this exercise is the same as improving the efficiency of a password cracker. But that was essentially the point of zxcvbn to begin with – to help password strength meters "catch up" to password cracking libraries.

(I understand that this fork is focused on cleanup, tech debt, and other higher priority things :) Hopefully it is flattering and not annoying that the suggestions are coming here now.)

MrWook commented 3 years ago

@modest this fork isn't just a clean up. I wanted to revive the project and the idea behind it because i think those password policies are plain up stupid. I would love to see more matchers and contribution. Which means your idea could be a extendet version of the password dictionary as a separated matcher with a new password list. Feel free to open an own issue for your idea and if you have the time you can even make a PR :)

Tostino commented 3 years ago

@modest and @MrWook, I did a little bit of thinking on this today, and I agree keeping those as separate matchers (or at least different match passes) seems like the right way to go. As said, using leaked password dictionaries and ranking by frequency is one attack vector that should have a set of scores associated with it (what we do today), and word matching by frequency is a totally different attack vector that needs to be scored an entirely different way to work properly.

I'd be interested in implementing this in Nbvcxz as well if there seems to be a consensus in how the algorithm should work, and appropriate scoring values.

zxcvbn-ts / zxcvbn

Add extremely common word sequences? #63