Failed to detect number substitutions - Githubissues

rominf / profanity-filter

A Python library for detecting and filtering profanity

GNU General Public License v3.0

158 stars 74 forks source link

Failed to detect number substitutions #5

Closed priyankagv1 closed 5 years ago

priyankagv1 commented 5 years ago

When trying to identify profane words sh1t is not getting identified as profane. Levenstein approach should have identified the variation to the original profane word. Also, I see that sh1t is listed under the profane word dictionary. Could you please see where the problem is?

rominf commented 5 years ago

Thank you for the report. The problem was that Spacy tokenizer splitted sh1t into tokens sh1 and t. I fixed this by adding all profane words to tokenizer special cases. Please use the latest version from PyPI.

priyankagv1 commented 5 years ago

Thank you so much.Will try and let you know!

rominf commented 5 years ago

Forgot to mention: with my improvements sh1t is detected fine (because it's in the profane word dictionary), but sh5t is still splitted into 2 words and, therefore is not detected. I don't know how to fix this yet.

rominf commented 5 years ago

I've got an idea. Will try it tomorrow.

priyankagv1 commented 5 years ago

Thank you!

rominf commented 5 years ago

I've got a better idea and I need more time.

rominf commented 5 years ago

Blocked by #14.

rominf commented 5 years ago

@priyankagv1, finally solved it. Please, try the latest version from PyPI.

priyankagv1 commented 5 years ago

Sure..thank you!