Closed zcchew1202 closed 4 months ago
Thank you for raising this and offering to submit a PR; I appreciate it. That said, I admit I am rather hesitant to add 'official' support for languages other than English to Obscenity for two primary reasons:
The library was designed with English in mind, and I am not sure how nicely some of its foundations generalize to other languages. In particular, I am skeptical as to whether the current system (character-based transformations, plus a carefully curated set of patterns) for detecting variants of terms will remain effective. For your request in particular, this is less of an issue because English and French are somewhat closely related. But the more pressing issue is:
I only personally speak English fluently, which means I can only attest to the quality of patterns for profanity in English. So while I could in theory accept an initial PR for a new French preset based on an existing dataset, it would be difficult to maintain that myself going forward. If, for instance, I later receive a bug report that phrase X is being erroneously marked as profanity by pattern Y in French, it would be exceedingly difficult to evaluate this report and to release a satisfactory fix on my own.
If you are a native French speaker and are able to both confirm that Obscenity works well for French and maintain the relevant code in the long-term, I would be willing to accept a PR. If you are not, though -- which is perfectly reasonable -- I would prefer to leave support for other languages out of the official project. You can still of course develop such support in your own project, perhaps even released separately on npm if you think it is something others would find useful.
Closing for now, but happy to re-open if you (or anyone else) address my previous comment.
Description
I'm working on a project that requires some french support. I saw https://github.com/darwiin/french-badwords-list/tree/master being adapted for https://github.com/jojoee/leo-profanity and was thinking of doing the same thing. I like how extensible this library is.
Solution
Similar to english.ts, the idea is to import and extract the array from https://github.com/darwiin/french-badwords-list/tree/master and build a dataset. I can work on a PR for it but can someone point me in the right direction for writing a test for this?
Code of Conduct