Implement a better profanity detection solution

monkeytypegame / monkeytype

The most customizable typing website with a minimalistic design and a ton of features. Test yourself in various modes, track your progress and improve your speed.

https://monkeytype.com/

GNU General Public License v3.0

15.61k stars 2.37k forks source link

Implement a better profanity detection solution #3207

Closed monkeytypegeorge closed 1 year ago

monkeytypegeorge commented 2 years ago

Our current naive solution marks a lot of false positives. We need to consider existing technologies that might handle this better.

angusjoshi commented 2 years ago

Hey! How about using this package for this? https://www.npmjs.com/package/bad-words . It works in much the same way as the current profanity detection, but contains a few extra useful functions and perhaps a more comprehensive list?

rustom commented 2 years ago

What are examples of the false positives? Could implement a library solution quickly @Miodec

Miodec commented 2 years ago

The main problem is with short words in those lists (either ours or the one that @angusjoshi1 suggested) - if your nickname happens to contain those combinations of letters it will be considered profane. I think best solution here for now is just to manually refactor the list (compare it to https://github.com/web-mech/badwords/blob/master/lib/lang.json perhaps) to include anything we have missed and exclude anything that is not very explicit/ could be easily misinterpreted.

scpdavis1 commented 1 year ago

Hi, I am interested in attempting this. is anyone assigned to it yet?

scpdavis1 commented 1 year ago

Hi! im new to contributing to open source projects but I would like to try this out. Can someone point me to where I can get started on understanding the related code?

Miodec commented 1 year ago

Hi! im new to contributing to open source projects but I would like to try this out. Can someone point me to where I can get started on understanding the related code?

No code to understand really. Just expand this list a bit, and make sure its not prone to false positives https://github.com/monkeytypegame/monkeytype/blob/master/backend/src/constants/profanities.ts

scpdavis1 commented 1 year ago

Hello again! what current methods are you guys using to test for false positives? or if there is documentation on that topic could someone point me there?

Miodec commented 1 year ago

We dont really have any process set up for that. When I went through the list last time I just removed very short words and combinations of letters that seemed too generic