lk-geimfari / mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
https://mimesis.name
MIT License
4.44k stars 336 forks source link

NSFW words generated by `Text.words` in `en` #1320

Closed mhauru closed 1 year ago

mhauru commented 1 year ago

The English text.json file (https://github.com/lk-geimfari/mimesis/blob/master/mimesis/data/en/text.json) has some words in the "normal" section that should probably be in the "bad" section. Some candidates for moving to "bad" that I spotted while browsing the list: "cock", "cocks", "pussy", "blowjob", "blowjobs", "booty", "busty", "butt". I only quickly browsed the file, that list is probably very incomplete.

There are also words that are in both "bad" and "normal", although I'm not quite sure why some of them are in "bad", e.g. "camel".

I haven't checked any of the other localisations, other than en.

lk-geimfari commented 1 year ago

This is pretty bad. Can you please create PR with fixes?

mhauru commented 1 year ago

I'll see if I can find a time, might take a moment. It seems it would be worthwhile to go through the whole text.json. Maybe other localisations are affected too, I have no idea.

jordiortegon commented 1 year ago

Can confirm that there are some "bad" words under the "normal" category in Spanish too. I'll try to look at them and move them