mjpost / sacrebleu

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Apache License 2.0
1.03k stars 162 forks source link

Empty lines in WMT21/dev Icelandic-English #225

Open ZJaume opened 1 year ago

ZJaume commented 1 year ago

Half of the English sentences are empty, is this expected?

$ sacrebleu -t wmt21/dev -l is-en --echo ref | grep -c '^[[:blank:]]*$' 
1004
$ sacrebleu -t wmt21/dev -l is-en --echo src | grep -c '^[[:blank:]]*$'
0
$ sacrebleu -t wmt21/dev -l en-is --echo ref | grep -c '^[[:blank:]]*$' 
1000
$ sacrebleu -t wmt21/dev -l en-is --echo src | grep -c '^[[:blank:]]*$'
0
BrightXiaoHan commented 1 year ago

This should be a bug when parsing xml.