Open nicobao opened 7 months ago
Given that it is SENTENCE classification, you can't really "highlight" one part that makes a piece of text "toxic".... The only thing that I can remotely think of is to process each word in a submission individually to find a "toxic" word - but this is really inefficient, and not what the model is suited for, it's not just looking at a word or phrase.....
You can do what I originally did with DeepMoji model (also sentence classification for emotion/sentiment). You do the sentence prediction w/o each word and see the difference in predicted probabilities, see more details here: https://huggingface.co/spaces/Pendrokar/DeepMoji/discussions/1#65eb375cdf813b9c15308c3c
Hi,
Thanks for the work on this library, it's quite accurate!
I'd be awesome if the model could pinpoint the aspect of the input text that triggered a high level (of toxicity or any other measured field).
Is there any easy way to do it already, maybe not for all cases, but for the obvious ones?