huggingface / evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
https://huggingface.co/docs/evaluate
Apache License 2.0
1.9k stars 235 forks source link

METEOR has no option to return unaggregated results #572

Open ashtonomy opened 3 months ago

ashtonomy commented 3 months ago

Describe the bug

The METEOR metric returns the mean metric as opposed to a list of individual metrics per ref/pred pair. This is inconsistent with other default metric behavior including rouge, bleu, and bertscore. This can be a problem when trying to calculate correlation, for example.

See metrics/meteor/meteor.py line 168 in _compute: return {"meteor": np.mean(scores)}

Steps to reproduce the bug

from evaluate import load

metric = load("meteor")
meteor.compute(references=["reference one", "reference two"], predictions=["prediction one", "prediction two"])

Expected results

{"meteor": [0.25, 0.25]}

Actual results

{"meteor": 0.25}

Environment info

evaluate version: 0.4.1
Platform: Rocky Linux 8.9 
Python version: 3.9.18