neulab / BARTScore

BARTScore: Evaluating Generated Text as Text Generation
Apache License 2.0
318 stars 37 forks source link

question about usage of different correlation coefficients for various datasets #36

Open xuyifan-0731 opened 1 year ago

xuyifan-0731 commented 1 year ago

I noticed that in your work, you reported Kendall's Tau coefficients for different metrics on the WMT19 dataset, Spearman coefficients for the text summarization dataset, and Pearson coefficients for the Q-CNN and Q-XSUM datasets. Why did you choose to use three different coefficients for assessing correlation with human judgments? Is this related to the composition of the datasets, or are there other reasons behind this choice?