vzhou842 / profanity-check

A fast, robust Python library to check for offensive language in strings.
https://pypi.org/project/profanity-check
MIT License
612 stars 113 forks source link

Added train_model.py and made the necessary modifications #19

Open menkotoglou opened 4 years ago

menkotoglou commented 4 years ago
  1. Reverse engineered training script
  2. Updated versions of scikit-learn and included it in setup.py and requirements.txt
  3. Retrained model with training data and updated version as in (2)
  4. Dropped support for Python3.5 (see Travis configuration), because it was no longer compatible with current scikit-learn version
  5. Fixed test as one test case is now below threshold

We tested these changes on a private dataset with the following results:

Before:

Predicted Actual | Not Profane(0) | Profane(1) Not Profane(0) | 703 | 14 Profane(1) | 93 | 39

Accuracy Score: 87.4%

After:

Predicted Actual | Not Profane(0) | Profane(1) Not Profane(0) | 697 | 20 Profane(1) | 87 | 45

Accuracy Score: 87.4%

Used in production as committed here: https://github.com/dimitrismistriotis/profanity-check

ieshaan12 commented 3 years ago

@koti How do I use this build?

dimitrismistriotis commented 3 years ago

@koti How do I use this build?

By referencing the other repository. For pip + "requirements.txt", use the following instead of "profanity-check":

-e git+https://github.com/dimitrismistriotis/profanity-check.git#egg=profanity-check

Also check this issue here if @vzhou842 accepts it, you can bring back profanity-check.

ieshaan12 commented 3 years ago

@dimitrismistriotis Thanks! I was wondering if we could implement a function which censors content like the profanity package?

dimitrismistriotis commented 3 years ago

@dimitrismistriotis Thanks! I was wondering if we could implement a function which censors content like the profanity package?

Censor is a very broad concept, also didn't get the "like the profanity package" part: profanity detects