unitaryai / detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
https://www.unitary.ai/
Apache License 2.0
935 stars 114 forks source link

Pinpoint the parts of the speech that trigger high values #104

Open nicobao opened 7 months ago

nicobao commented 7 months ago

Hi,

Thanks for the work on this library, it's quite accurate!

I'd be awesome if the model could pinpoint the aspect of the input text that triggered a high level (of toxicity or any other measured field).

Is there any easy way to do it already, maybe not for all cases, but for the obvious ones?

voarsh2 commented 2 months ago

Given that it is SENTENCE classification, you can't really "highlight" one part that makes a piece of text "toxic".... The only thing that I can remotely think of is to process each word in a submission individually to find a "toxic" word - but this is really inefficient, and not what the model is suited for, it's not just looking at a word or phrase.....

bfelbo commented 2 months ago

You can do what I originally did with DeepMoji model (also sentence classification for emotion/sentiment). You do the sentence prediction w/o each word and see the difference in predicted probabilities, see more details here: https://huggingface.co/spaces/Pendrokar/DeepMoji/discussions/1#65eb375cdf813b9c15308c3c