Not recognizing dictionary words when multiple dictionaries are used

If you use the common, en and de dictionary in the options, words like "eigentlich" are not seen as a dictionary word and considered brute-force. If only the de dictionary is used in the options, it is recognized as a dictionary word.

Hey, thank you for the report. Never thought about it and i wrote the whole documentation about it 🤦 The dictionary names are all the same so if we merge it in the options only the last entry will be used.

I think i will rename all dictionaries with a language suffix so this will not happen anymore. 🤔

As a workaround you could either merge the dictionaries by yourself or rename them.

I think i will rename all dictionaries with a language suffix so this will not happen anymore. 🤔

Thanks for #249 👍 Using multiple languages packages in dictionaries seems like a common use case.

@MrWook would it be possible to get it published? The last published version of the English dictionary and language package is @zxcvbn-ts/language-en@3.0.2, published 9 months ago, which does not contain the changes:

https://unpkg.com/browse/@zxcvbn-ts/language-en@3.0.2/dist/index.esm.js

As a workaround you could either merge the dictionaries by yourself or rename them.

I was looking for this in the docs actually - recommendations for how to safely merge dictionaries.

Even after #249 is released, I would suggest the following 2 docs topics:

How to manually merge dictionaries safely (currently seems pretty hazardous because of the nested config data shape and potential for object key naming collisions)
How to add custom words to dictionaries easily (and where in the object structure to put them). For example, probably every website should include their website name, URL, brand and variations thereof, because common password patterns include adding the name of the service to the password, which is highly insecure.

Hey @karlhorky,

this feature is a breaking change so it will come with the next major release. For that i still need to finish https://github.com/zxcvbn-ts/zxcvbn/issues/223 and give it try for https://github.com/zxcvbn-ts/zxcvbn/issues/254. I'm not so sure about the second issue, maybe i do it for version 5.

The dictionaries options is just a plain Record of arrays. So everybody could do what they want and customize it like they seem fit. For example too merge two language dictionaries:
```
import * as zxcvbnDePackage from '@zxcvbn-ts/language-de'
import * as zxcvbnEnPackage from '@zxcvbn-ts/language-en'
```

const options = { dictionary: { commonWords: [ ...zxcvbnDePackage.dictionary.commonWords, ...zxcvbnEnPackage.dictionary.commonWords, ] }, }

Or you could use a deep-merge library and just do this:

```ts
import * as zxcvbnDePackage from '@zxcvbn-ts/language-de'
import * as zxcvbnEnPackage from '@zxcvbn-ts/language-en'
import deepMerge from 'deepmerge'

const options = {
  dictionary: deepMerge(zxcvbnDePackage.dictionary, zxcvbnEnPackage.dictionary),
}

Here is a simple function that iterates over all entries of a language dictionary. Its a little bit dirty but its from chatGPT and works just fine

function combineObjects(...objects: { [key: string]: any[] }[]): {
  [key: string]: any[]
} {
  return objects.reduce(
    (combined, current) => {
      Object.keys(current).forEach((key) => {
        if (!combined[key]) {
          combined[key] = []
        }
        combined[key] = combined[key].concat(current[key])
      })
      return combined
    },
    {} as { [key: string]: any[] },
  )
}

You can use it like the deepmerge library:

import * as zxcvbnDePackage from '@zxcvbn-ts/language-de'
import * as zxcvbnEnPackage from '@zxcvbn-ts/language-en'
import * as zxcvbnJaPackage from '@zxcvbn-ts/language-ja'
import deepMerge from 'deepmerge'

const options = {
  dictionary: combineObjects(zxcvbnDePackage.dictionary, zxcvbnEnPackage.dictionary, , zxcvbnJaPackage.dictionary),
}

Maybe i could add the combineObjects function as a helper function. Probably i need to add helper function for https://github.com/zxcvbn-ts/zxcvbn/issues/254 anyway.

Your described use case is already implemented https://zxcvbn-ts.github.io/zxcvbn/guide/user-input/ You could just do zxcvbn('myPassword', ['www.my-cool-site.com', 'MrWook', 'github'])

As to why i'm not doing anything. I'm pretty busy with my personal life at the moment and i'm maintaining this project by my self but i'm always happy for some helping hands. This timing issue should get better until the end of the year.

As to why i'm not doing anything. I'm pretty busy with my personal life at the moment and i'm maintaining this project by my self but i'm always happy for some helping hands. This timing issue should get better until the end of the year.

Of course! It's open source, you don't owe me or anyone else anything. Thanks for maintaining it ❤️ And hope things are ok with you.

this feature is a breaking change so it will come with the next major release.

Oh, wasn't aware, thanks for the tip!

Would you be ok with reopening this issue until the major release is published to npm? I would suggest this so that it's visible to users that it's an open issue in the current published version. Currently it's hard to find this problem, and this topic is missing from the docs.

The dictionaries options is just a plain Record of arrays. So everybody could do what they want and customize it like they seem fit.

Yeah, after analyzing the data structures and looking at the content of node_modules for a while, I figured this out too.

But there are sharp edges with combining Records, as we see in this issue (combining multiple dictionaries with property name collisions).

Another path of questioning that I would have as a user would be:

"what should I do if I have an array that overlaps heavily with another array?"
"should I pre-process my overlapping arrays (eg. remove duplicates as in 1) before I add them to dictionaries?"
"what are the performance or other implications of adding a large dataset to dictionaries? is there a better way to add data to be more performant?"
"will I receive runtime errors (other than type errors) if there is some kind of collision or I add data that is incompatible in some way?"

Ideally I would hope for / expect a bit simpler of an API for this library, but I think with some documentation (and maybe some error handling / throwing) the current API could also work.

Maybe i could add the combineObjects function as a helper function. Probably i need to add helper function for #254 anyway.

Yeah either some kind of helper (and ideally also docs, front and center in "Getting Started") would be good, to avoid the problems that users are running into.

2. Your described use case is already implemented

That sounds pretty nice, yes! But it seems semantically different (eg. dynamic user input vs fixed website information).

The usage example of zxcvbnOptions.setOptions() seems closer - I could imagine something like this:

const options = {
  dictionary: {
    websiteAndBrandWords: ['mycoolsite', 'my cool site', 'my.cool.site'],
  },
}
zxcvbnOptions.setOptions(options)

But again, it is in a section titled "UserInput", so that is semantically a different use case. My docs suggestion in 2 above is this:

a different section in the docs, entitled something like "Filtering custom words"
I would recommend top-level, because it seems like one of the first things that a user would want to do
- maybe below "Getting Started", or below "Languages"
this new section could actually have "Filtering user input" (the current "UserInput" page) as a sub-heading

On an unrelated "docs improvements" note: the top-level menu sections could be optimized further for most users - the positions of "Migration" and "Examples" could be swapped, since examples are much more interesting for most users than Migration.

I can open an umbrella issue for improving the docs experience for users, if this would be accepted.

Would you be ok with reopening this issue until the major release is published to npm? I would suggest this so that it's visible to users that it's an open issue in the current published version

Unfortunately thats not how github works. I link the issues with the merge requests and if the MR is merged the issue is closed. I think other libraries are not keeping the issues around too. But i can pin it so that everybody can see this issue as it is a major flaw in the system.

Currently it's hard to find this problem, and this topic is missing from the docs.

Documentation isn't one of my best points :D But if its not too much work for you feel free to add it as the documentation is just a bunch of .md files here

"what should I do if I have an array that overlaps heavily with another array?" Like all duplicated things the performance would degrade a little bit but it's not really an issue but i recommend to merge them as with [...new Set(dictionary_1), ...net Set(dictionary_2)] to one dictionary. This would remove all duplicates but only the later ones so it will be in a nice order. As the order of the dictionary will define the scoring.
"should I pre-process my overlapping arrays (eg. remove duplicates as in 1) before I add them to dictionaries?" Same answer as in the first
"what are the performance or other implications of adding a large dataset to dictionaries? is there a better way to add data to be more performant?" Of course the performance will suffer as a result but its not really crucial and you will get better results. This is explained by the original author of the library here. I recommend too check it out if you want to know more about those things. The real performance killers in this library are actually the levenshtein, pwned and leet matcher which are all combined and go over the dictionaries. But the levenshtein and pwned matcher are not enabled by default. I once tried a few things but they either frezzed the site or weren't really better but if someone has an idea to improve this i'm happy to include it.
"will I receive runtime errors (other than type errors) if there is some kind of collision or I add data that is incompatible in some way?" There should only be a check if you add custom translations so you don't miss any keys 🤔 Other than that there should be only type checks.

Ideally I would hope for / expect a bit simpler of an API for this library, but I think with some documentation (and maybe some error handling / throwing) the current API could also work.

Actually the new major version will have a new API as described in the current pinned issue. It will be fully classed based. If you have more ideas feel free to tell me :) As this library is not aimed only to people who are using typescript or other type sources i think it would be a good idea to throw some runtime error for wrong options.

The usage example of zxcvbnOptions.setOptions() seems closer - I could imagine something like this:

This idea of yours is already working exactly as you typed it.

My doc suggestion 2 above is this:

Like i wrote previously feel free to adjust the documentation. As the maintainer of this library it's a bit hard to see what a consumer of the library really needs as i have everything in my head and "think" thats already known so i don't really "think" about writing it down. It would be a great help if a consumer of the library like yourself is going over the documentation and improves it. Like i already said i'm really not good in writing documentation :D But if you don't have the time an umbrella ticket is awesome too!

Unfortunately thats not how github works. I link the issues with the merge requests and if the MR is merged the issue is closed. I think other libraries are not keeping the issues around too. But i can pin it so that everybody can see this issue as it is a major flaw in the system.

Projects can choose how they want to use issues / PRs on GitHub. I've seen a number of projects that will keep issues open until the work has been released (or even some greater requirement, like if the work has been integrated and tested in some external system, after release).

But I understand if you and the other maintainers of zxcvbn-ts would like to run things in a certain way / workflow.

Thanks for pinning!

i recommend to merge them

Ok nice, thanks for the tip. These types of best practices things would be good to have in docs (how to optimally work with dictionaries, add new entries, stay performant, etc).

Actually the new major version will have a new API as described in the current pinned issue. It will be fully classed based

Ah ok, "class based" sounds like potentially more complicated of an API (personally preferring more functional APIs as much as possible) - hope that usage stays simple for users!

As this library is not aimed only to people who are using typescript or other type sources i think it would be a good idea to throw some runtime error for wrong options.

Nice, that would be good! I think also runtime errors are also useful for TypeScript users (multiple channels of messaging that something is going wrong are good).

This idea of yours is already working exactly as you typed it.

Yeah, I was expecting the code to work - it was more for demonstrating how it wasn't related to the current section "UserInput".

Like i wrote previously feel free to adjust the documentation. As the maintainer of this library it's a bit hard to see what a consumer of the library really needs as i have everything in my head and "think" thats already known so i don't really "think" about writing it down. It would be a great help if a consumer of the library like yourself is going over the documentation and improves it. Like i already said i'm really not good in writing documentation :D But if you don't have the time an umbrella ticket is awesome too!

Nice, I'll try to find some time to open an issue describing the changes I would recommend to the docs to improve the flow and clarity. And maybe already add some of those "best practices" things from how to use and add to dictionaries.

@karlhorky

Projects can choose how they want to use issues / PRs on GitHub. I've seen a number of projects that will keep issues open until the work has been released

Like i said i use the normal github workflow i would like to keep the issues open too but there is no configuration for something like that and i don't want to go against the tools that i'm using.

Ah ok, "class based" sounds like potentially more complicated of an API (personally preferring more functional APIs as much as possible) - hope that usage stays simple for users!

Probably but otherwise i need to expose a bunch of heavy scripts to compute the correct dictionaries and other options. So a class that will handle this is a lot easier. You could say a factory function would work too but i think this would be personal preferences and the classes approach made the overall codebase a lot easier.

I added your ideas to issues to track the progress on them https://github.com/zxcvbn-ts/zxcvbn/issues/269 https://github.com/zxcvbn-ts/zxcvbn/issues/268

Additionally i published a beta version of the next major release so that i have more time to implement more breaking changes but have the other breaking changes out there

zxcvbn-ts / zxcvbn

Not recognizing dictionary words when multiple dictionaries are used #241