unitaryai / detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
https://www.unitary.ai/
Apache License 2.0
893 stars 115 forks source link

Toxicity scores, same as Perspective API? #63

Open SallyBean opened 1 year ago

SallyBean commented 1 year ago

Great repo!

I have a question, I hope someone can help?

Are the toxicity scores provided by the Unitary models, probability scores, in the same way that perspective API returns these values?

"The only score type currently offered is a probability score. It indicates how likely it is that a reader would perceive the comment provided in the request as containing the given attribute. For each attribute, the scores provided represent a probability, with a value between 0 and 1. A higher score indicates a greater likelihood that a reader would perceive the comment as containing the given attribute. For example, a comment like “You are an idiot” may receive a probability score of 0.8 for attribute TOXICITY, indicating that 8 out of 10 people would perceive that comment as toxic. "

Or do they represent the extent of the toxicity?

Thanks so much!

SallyBean commented 1 year ago

I'm particularly interested in the thresholds you recommend for analysing whether something is 'toxic', do you recommend anything above 0? Or something more in line with the perspective API, i.e. anything above .70 for social scientists?

laurahanu commented 1 year ago

Hello!

The scores are probability scores, similar to the Perspective API ones. 0.7 sounds like a good starting point for a threshold although this will vary depending on the use case and the tolerance for either false positives or false negatives.

Hope this helps!