neulab / BARTScore

BARTScore: Evaluating Generated Text as Text Generation
Apache License 2.0
318 stars 37 forks source link

Using BARTScore to Compare 2 summaries without Human Evaluation #39

Open pranamyapatil opened 1 year ago

pranamyapatil commented 1 year ago

I went through the analysis script for comparing 2 evaluation metric wrt human evaluation (meta evaluating evaluation metric).

I wanted to know if there is some way to compare 2 summaries with help of standalone BARTscore.

Eg:- Higher Rouge Score then better the summary. Similarly can we calculate BARTScore for 2 summaries and then conclude that higher BARTScore better it is

yyy-Apple commented 1 year ago

Yes, similar to Rouge score, the higher the BARTScore, the better the summary.

anutammewar commented 1 year ago

Yes, similar to Rouge score, the higher the BARTScore, the better the summary.

I had a similar doubt about the interpretation of the scores. I understand higher = better. I'm still confused about the interpretation in terms of absolute score (how much a high score is a good score). In the paper, what were the absolute scores for the REALSumm and SummEval datasets? This would give a good reference point.