utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.86k stars 341 forks source link

High confidence for False Positive results #39

Open 008karan opened 5 years ago

008karan commented 5 years ago

I have trained multi class text classifier using BERT. I a getting accuracy around 90%. The only issue is the model is classifying out of domain sentences with very high confidence score(e.g. 0.9954564 score). I have seen in other models like space supervised it classify out of domain sentences with very low confidence which helps to detect them. Is there any method to solve this problem?

kaushaltrivedi commented 5 years ago

Interesting. It does depend on your data size. You can create a dummy label called None and try to add out of domain data for it. I will soon add ROBERTA to the library. Would be interesting to see how it performs with that one