maszhongming / UniEval

Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation
MIT License
190 stars 25 forks source link

Why are the tau and rho tables in Summeval different from those in the original Summeval paper? #8

Open carlesoctav opened 3 hours ago

carlesoctav commented 3 hours ago

Tau and rho from the Summeval paper: image

Tau and rho from the Unieval paper: image

I believe the issue lies in how you compute the scores. Instead of calculating the Rouge score against the annotated reference, you compute it directly with the source text. Don’t you think this is unfair to scoring functions that have a limited token input, or to those that operate at the set level, like Rouge? Thank you.

maszhongming commented 2 hours ago

I’d like to clarify a few points regarding your questions:

  1. The ROUGE metric correlations are from the BARTScore paper, but I believe all ROUGE scores are calculated against annotated references.
  2. Table 2 you provided shows system-level correlations, whereas Table 3 refers to summary-level correlations. Please refer to this paper for the distinction between the two.