wietsedv / bertje

BERTje is a Dutch pre-trained BERT model developed at the University of Groningen. (EMNLP Findings 2020) "What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models"
https://aclanthology.org/2020.findings-emnlp.389/
Apache License 2.0
135 stars 10 forks source link

Using BERTje for sentiment classification #26

Closed jrnkng closed 2 years ago

jrnkng commented 2 years ago

Hi Wietse!

I am trying classify given texts (usually about 100 words) as either positive or negative. How would I go about doing that with BERTje?

I tried the following based of off the fill-mask example that is shared in the READme and on Huggingface.

model = pipeline("sentiment-analysis", model='GroNLP/bert-base-dutch-cased')
negative_dutch_text = 'Dat is heel vervelend om te horen! Ik ben ook heel boos hierover. Wat een rotzooi.' 
model(negative_dutch_text)

For every sentence this outputs LABEL_0 with a score somewhere around 0.55. I would expect this to be strongly negative. In what way are my expectations off? How would I go about using it to classify it positive or negatively?

Thanks alot!

wietsedv commented 2 years ago

Sorry for my late comment. To use pre-trained models like BERTje for specific tasks, you need to fine-tune the models or use already fine-tuned models. You could try using wietsedv/bert-base-dutch-cased-finetuned-sentiment instead of GroNLP/bert-base-dutch-cased. This model is fine-tuned on Dutch book reviews, so out-of-domain performance may still be a bit unreliable. I hope this helps!