mjpost / sacrebleu

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Apache License 2.0
1.02k stars 160 forks source link

Difference between cli and corpus_score #259

Open VarunGumma opened 4 months ago

VarunGumma commented 4 months ago

I am attempting to compute chrF++ for a set of predictions and references. If I use sacrebleu cli (sacrebleu ref.eng_Latn.tok < pred.eng_Latn.tok -m bleu chrf --chrf-word-order 2), I find a significant difference when I use corpus_score with CHRF(word_order=2).corpus_score(preds, refs). I have double-checked the data in both cases, and it is correct and the same, so no issues there. Any reason why this is happening? Similarly, the BLEU scores (with BLEU().corpus_score(preds, refs)) also varies significantly. Are there some default params that I am missing?

nkrasner commented 1 month ago

I think this is related to #220 . I was having the same issue, transposing the references as they mentioned fixed my issue.