zxcvbn-ts / zxcvbn

Low-Budget Password Strength Estimation
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/wheeler
MIT License
858 stars 68 forks source link

Add Arabic Language Package #229

Closed itsmohmans closed 1 year ago

itsmohmans commented 1 year ago

Note that I removed the common words list because it's very rare to use Arabic words (with Arabic characters) in passwords, I do think that if someone uses Arabic characters in their password it could make it more secure than a regular password with Latin characters (I know there are different factors to determine, but I think it makes the password less predictable)

MrWook commented 1 year ago

Hey thank you for your contribution!

The name lists seems like the japanese name list where the characters are switched with roman letters. Wouldn't this also apply to the arabic language usage with password. I would assume that like in japanese people would write common arabic words in roman letters instead of arabic letters 🤔 This would mean that it would be the best if we have some kind of "arabic -> roman" letter parser and parse the common words list and the wikipedia entries with it.

itsmohmans commented 1 year ago

I would assume that like in japanese people would write common arabic words in roman letters instead of arabic letters

This is true, people often write their name in its English transliteration. This is why I included Arabic names list in Latin letters.

This would mean that it would be the best if we have some kind of "arabic -> roman" letter parser and parse the common words list and the wikipedia entries with it.

Yes! This is a great idea because unfortunately I didn't find ready lists of most popular Arabic names / words in their English form, the source I found for last names and first names didn't contain that many names (I added some myself) and didn't have female names either. Aggregating names translations from Wikipedia is a good idea, but we might run into 2 problems

  1. I think ًwikipedia tends to translate the name rather than transliterates it. An example of the name "يعقوب" which Arabs often transliterate it to "Yacoub" or "Yaqoub", but its English Wikipedia page shows the name in its translated English version which is "Jacob".
  2. Many Arabic names can have different spellings in their English transliteration, take Mohamed as an example, some people write it as Mohammad or Mohammed or Muhammad. If we add a script that aggregate names, I don't know how we can handle cases of different spellings of the same name.

Edit: regarding the most common words, to be honest I'm not sure if people might use transliterated Arabic words in passwords, but even if some do, we'll also have to consider different spellings of words, and some people might use numbers instead of some letters like '7abibi' instead of 'Habibi' for example (Arabic chat alphabet)

MrWook commented 1 year ago

@itsmohmans i investigate a little bit into the arabic language and it seems like there isn't really a suitable way of converting it to roman letters. As you already said there are to many different ways of converting it and the tools that are doing it only do it for exactly one kind of way. So lets stick with this MR and maybe in the future we will have an idea about it