I noticed that in your work, you reported Kendall's Tau coefficients for different metrics on the WMT19 dataset, Spearman coefficients for the text summarization dataset, and Pearson coefficients for the Q-CNN and Q-XSUM datasets. Why did you choose to use three different coefficients for assessing correlation with human judgments? Is this related to the composition of the datasets, or are there other reasons behind this choice?
I noticed that in your work, you reported Kendall's Tau coefficients for different metrics on the WMT19 dataset, Spearman coefficients for the text summarization dataset, and Pearson coefficients for the Q-CNN and Q-XSUM datasets. Why did you choose to use three different coefficients for assessing correlation with human judgments? Is this related to the composition of the datasets, or are there other reasons behind this choice?