vzhou842 / profanity-check

A fast, robust Python library to check for offensive language in strings.
https://pypi.org/project/profanity-check
MIT License
612 stars 113 forks source link

Model must be upgraded #26

Open Cafeepy opened 3 years ago

Cafeepy commented 3 years ago

The sklearn model used in this package was made for sklearn version 0.20.2. The latest stable version of sklearn is 0.23.2, and it is not compatible with the model from 0.20.2. When trying to run this package on sklearn 0.23 or greater, you'll encounter an unavoidable unpickling error of sorts. Yes, it is possible to install an earlier version of sklearn but versions before version 0.22 are not compatible with Python 3.8. As others have noted, there is also a significant performance decrease when using sklearn 0.22.2 and several warnings warn about lack of backwards compatibility at runtime.

If this library is to be maintained, all I ask is that the model be upgraded/retrained to be compatible with sklearn 0.23.2 and Python 3.8, or at least the code/data used to train the original model be provided to allow others to retrain the model themselves. Going forward into Python 3.9 and greater, this library will unfortunately fall into deprecation unless this happens.

I really admire this library, honestly all it needs now is a bit of polishing. Thanks for your time!

AlexYurkin commented 3 years ago

I found an article the author wrote. It contains information about datasets he uses. https://towardsdatascience.com/building-a-better-profanity-detection-library-with-scikit-learn-3638b2f2c4c2

yaskh commented 3 years ago

@Cafeepy this article contains the script using which the model was trained. Found this link the pypi of profanity-check.