Closed priyankagv1 closed 5 years ago
Thank you for the report. The problem was that Spacy tokenizer splitted sh1t
into tokens sh1
and t
. I fixed this by adding all profane words to tokenizer special cases. Please use the latest version from PyPI.
Thank you so much.Will try and let you know!
Forgot to mention: with my improvements sh1t
is detected fine (because it's in the profane word dictionary), but sh5t
is still splitted into 2 words and, therefore is not detected. I don't know how to fix this yet.
I've got an idea. Will try it tomorrow.
Thank you!
I've got a better idea and I need more time.
Blocked by #14.
@priyankagv1, finally solved it. Please, try the latest version from PyPI.
Sure..thank you!
When trying to identify profane words
sh1t
is not getting identified as profane. Levenstein approach should have identified the variation to the original profane word. Also, I see thatsh1t
is listed under the profane word dictionary. Could you please see where the problem is?