Closed HatScripts closed 10 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
ad51d19
) 100.00% compared to head (8c77c76
) 100.00%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Also, I noticed that sh|t
isn't being censored. Not sure why. Terms like f@g
, fu(k
, c0ck
, etc. are being correctly censored.
Edit: This is happening because |
(vertical bar) is being replaced by l
(lowercase L) in resolve-confusables
: https://github.com/jo3-l/obscenity/blob/ad51d193ef23a52b685ee5e5c603b456ed9130b2/src/transformer/resolve-confusables/confusables.ts#L70
Sorry, but I'm not convinced these substitutions are clearly valuable, with the exception of perhaps ! -> i
. When we add back whitespace stripping, for instance, 6 itch
will be flagged as containing bitch
which seems a rather egregious false positive. I'm sure there are many more examples in this vein, so would prefer to err on the side of caution here unless you have a compelling reason otherwise.
6 itch
will be flagged as containingbitch
I could be wrong, but to me, this seems like an argument for including .addWhitelistedTerm('b itch')
, not for excluding the leetspeak.
However, I'm having trouble even finding a word ending in b
that would appear before the word itch
. The only thing I can come up with for now is scab itch
.
To be clear, my point is that the phrase 6 itch
-- not including the b
originally -- would be flagged as containing profanity, because 6
would be remapped to b
. That seems a clear mistake to me.
Oh okay, that makes sense. Forgive my confusion.
Just to clarify, assuming resolve-leetspeak
is ran before the check for whitelisted terms, wouldn't .addWhitelistedTerm('b itch')
still be enough in the case of 6 itch
to ignore the false positive?
assuming resolve-leetspeak is ran before the check for whitelisted terms
This is not the case; whitelisted term matching runs essentially on the original text by default (with only lowercasing applied.) I am not sure this is something we want to change.
Per my previous comments I'm not sure this is worth doing, so closing for now. Happy to discuss if you disagree, as always.
Type of change:
Please describe the changes this PR makes and why it should be merged:
Updated the leetspeak dictionary to include:
'6'
→'b'
'8'
→'b'
'9'
→'g'
'!'
→'i'
'5'
→'s'
'7'
→'t'
'2'
→'z'
Some of these replacements might be overkill, so please let me know what you think. Further reading: https://en.wikipedia.org/wiki/Leet#Table_of_leet-speak_substitutes_for_normal_letters
Status: