Open JingqiWei opened 2 years ago
Hi! We didn't use the default version (roberta-based) of BertScore instead we used the bert-based model mainly because of different tokenizations. It has been a while so I couldn't recall the exact setting, but I think we used bert-large-uncased
. I think it should be okay as long as you are using the same setting to compare different models. Please let me know if you have more questions.
Thanks for your reply
Hi, I have a question about BS and MS metrics. In BS metrics, I get a score of 0.88 when I use ‘rescale_WITH_baseline =True’. With ‘rescale_WITH_baseline =False’, the score drops to around 0.46. Both results are different from yours. I'am glad if you can tell me more,thanks!