trichards57 / zxcvbn-cs

C#/.NET port of Dan Wheeler/DropBox's Zxcvbn JS password strength estimation library
MIT License
59 stars 19 forks source link

New word lists not in use? #25

Open cearprm opened 3 years ago

cearprm commented 3 years ago

Good to see that the word lists have been updated to match zxcvbn-ts. But I think it's still the old ones in Dictionaries directory that are being loaded and used?

I tested this by scoring a password that appears in Data\surnames.lst but not in Dictionaries\surnames.lst, e.g. hoffstetter

trichards57 commented 3 years ago

So I think you are seeing the effect of the frequency list pre-processing. The Data lists are the raw files, and the Dictionaries folders are the processed ones. For surnames it only takes the top 10,000 items from the raw list. The TS library does the same thing: if you look in https://github.com/trichards57/zxcvbn/blob/master/src/frequency_lists.ts you will see that hoffstetter isn't in there either.

This was a design decision I inherited from DropBox, I suspect because include absolutely everything will impact performance with steadily decreasing benefit.

I'm open to being persuaded that this isn't a real problem, but I'd probably want to see the effect it had on both libraries before I changed the behaviour.

Hope this clarifies for you :)

cearprm commented 3 years ago

Ok, interesting. I was not aware that only a subset of the items from the word lists were actually used (checked against).

How frequently might the lists be updated in future, and how frequently is the Dictionaries folder updated with 'top' items, please?