Closed shaily99 closed 3 years ago
This was a while ago, but I think these are the hyperparameters I specified (the rest were all hugginface defaults):
--model_type roberta --model_name_or_path roberta-base --max_seq_length 128 --learning_rate 2e-5 --num_train_epochs 3.0
In any case, I wouldn't worry too much about it. These models are not well calibrated, it's normal that most predictions are super confident, and getting neutral predictions within a range is a hack, since these models are trained on binary rather than three way classification. We did it so we could compare research models to commercial models, but this is not what I would do if I actually wanted a sentiment model that predicted 'neutral'
I am trying to use a XLMR model finetuned with SST-2 for checklisting. The parameters I am using currently don't seem to give neutral predictions (positive in the range of 0.33-0.66) ever. Can you please share the exact parameters (learning rate, epochs, weight decay, dropout, etc) that you used to fine-tune Roberta while doing the Checklisting?