The METEOR metric returns the mean metric as opposed to a list of individual metrics per ref/pred pair. This is inconsistent with other default metric behavior including rouge, bleu, and bertscore. This can be a problem when trying to calculate correlation, for example.
See metrics/meteor/meteor.py line 168 in _compute:
return {"meteor": np.mean(scores)}
Describe the bug
The METEOR metric returns the mean metric as opposed to a list of individual metrics per ref/pred pair. This is inconsistent with other default metric behavior including rouge, bleu, and bertscore. This can be a problem when trying to calculate correlation, for example.
See
metrics/meteor/meteor.py
line 168 in_compute
:return {"meteor": np.mean(scores)}
Steps to reproduce the bug
Expected results
{"meteor": [0.25, 0.25]}
Actual results
{"meteor": 0.25}
Environment info