ulif / diceware

Passphrases to remember
GNU General Public License v3.0
354 stars 45 forks source link

Remove the n.. word from worldlists #85

Closed kmille closed 2 years ago

kmille commented 2 years ago

The n.. word is discriminating. Please remove it from the wordlists.

ulif commented 2 years ago

Hi @kmille , Thanks for your hints.

While you're generally right, three things have to be thought over.

First, please do not remove terms from lists without (carefully) providing replacements. The length of wordlists determines their cryptographical strength. You should have noticed that actually, while running the tests with your changes.

Please run the tests before commiting PRs.

Ideally, replacements

Then, changing the *-orig lists cannot be done without removing them completely. That's because I am not the source of these lists and they are provided for reference. Instead, you should ask A G Reinhold to remove the n-word from the main source, which can be found here: https://theworld.com/%7Ereinhold/diceware.wordlist.asc

Third, the n-word is, best to my knowledge, not a racist word in Portuguese or Brazilian. I will talk to some native speakers about that. If you have other sources, I would be happy to learn.

Please tell, which of the first two tasks you are willing to take over.

kmille commented 2 years ago

Hey @ulif I really appreciate your constructive feedback! Please, give me some time to think about. I'm just doing too many small things (like this) that are exploding right now.

ulif commented 2 years ago

Hi @kmille , Good to know. I can care for the problem if you agree. Just did not want to exclude you from handlung "your" PR/Issue. If you do not object, I would try to fix the issue myself. Generating proper wordlists automatically was on my todo list anyway :)

kmille commented 2 years ago

Feel free to handle this PR :) Thanks for your effort!

BradKML commented 2 years ago

For future references, please check the list of repos that catalogs other curse words in https://github.com/words/cuss/issues/40

ulif commented 2 years ago

Thank you, @BrandonKMLee , really helpful! I am also looking for "bad-word" lists in German. Any hints are much appreciated.

BradKML commented 2 years ago
ulif commented 2 years ago

Very interesting! Thanks again, @BrandonKMLee ! It looks like lots of people work with relatively small blacklists, where they collect swear-words they have seen somewhere sometime ago :) More helpful might be wordlists based on sentiment analysis of large text corpora. I found something like that for German words. Also quite interesting although it is still a short list and does not contain the typical swear-words: SentiWS - available at https://wortschatz.uni-leipzig.de/download/.