vzhou842 / profanity-check

A fast, robust Python library to check for offensive language in strings.
https://pypi.org/project/profanity-check
MIT License
612 stars 114 forks source link

Doesn't understand context #29

Open hwsamuel opened 3 years ago

hwsamuel commented 3 years ago

The library seems to be working more like a dictionary look up for swear words. For example, it can correctly tag "fucking idiot" as negative, but also tags "fucking awesome!" as negative. Maybe the training set's features were uni-grams?

menkotoglou commented 3 years ago

From my point of view, that happens because of the learning algorithm the library uses. By tokenizing each word, "fucking" gets a huge probability of being profane, since it is profane in any context. For example, you cannot say "fucking awesome!" in a professional environment. If you place "fucking awesome!" in clean_data.csv, you will label as 1 (profane), not 0(not profane).