Difference between cli and corpus_score

I am attempting to compute chrF++ for a set of predictions and references. If I use sacrebleu cli (sacrebleu ref.eng_Latn.tok < pred.eng_Latn.tok -m bleu chrf --chrf-word-order 2), I find a significant difference when I use corpus_score with CHRF(word_order=2).corpus_score(preds, refs). I have double-checked the data in both cases, and it is correct and the same, so no issues there. Any reason why this is happening? Similarly, the BLEU scores (with BLEU().corpus_score(preds, refs)) also varies significantly. Are there some default params that I am missing?

mjpost / sacrebleu

Difference between cli and corpus_score #259