Closed KevinTyrrell closed 5 years ago
Hey, you're absolutely right! There's a lot that profanity-check
won't catch because clever reformats of profanity won't appear in training datasets :(. This isn't what profanity-check
is designed for, though - its main focus is being smarter and more robust than traditional wordlists while also being more performant and having a lower footprint than more complex ML solutions.
The reason most word lists are not good enough to stop profanity is because people can format the words different than something that would be expected.
https://github.com/vzhou842/profanity-check/releases/tag/v1.0.2 fails to stop this behavior.
e.g.
!@#$%^ <-- 6 letter offensive word, caught by profanity-check
! @ # $ % ^ <-- same word, spaced, 12.1% certainty of profanity.