mjpost / sacrebleu

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Apache License 2.0
1.07k stars 164 forks source link

No references found for test set wmt23/* #261

Closed kellymarchisio closed 7 months ago

kellymarchisio commented 8 months ago

Running the below command results in: sacreBLEU: No references found for test set wmt23/en-de.: cat $OUTFILE | sacrebleu -t wmt23 -l en-de

Same occurs for de-en/en-ja (I did not try others)

martinpopel commented 8 months ago

I found this bug last week and fixed it in #260, which I had merged now. (Thanks for reporting. I've forgotten this issue over the week.)

kellymarchisio commented 8 months ago

Nice, thanks! Do you plan to push the change to PyPi for easy install? (Of course installation is also easily done with python setup.py install, but may be more user-friendly out-of-the-box to have it on pypi)

kellymarchisio commented 7 months ago

Hi @martinpopel - I'm just coming back to this, and looks like I still have the issue.

  1. I cleared out the cache with rm -r /home/kelly/.sacrebleu/wmt23
  2. I reinstalled sacrebleu from source with python setup.py install
  3. I run sacrebleu -i test.out -t wmt23 -l en-de, which results in
    
    sacreBLEU: No references found for test set wmt23/en-de.
    sacreBLEU: System and reference streams have different lengths.
    sacreBLEU: This could be an issue with your system output or with sacreBLEU's reference database if -t is given.
    sacreBLEU: For the latter, try cleaning out the cache by typing:

sacreBLEU: rm -r /home/kelly/.sacrebleu/wmt23

sacreBLEU: The test sets will be re-downloaded the next time you run sacreBLEU.

My test.out is 557 lines, as it should be.   When I run `wc` on the .sacrebleu cache, I see:

557 27933 186081 wmt23.en-de.AIRC 557 34155 233250 wmt23.en-de.GPT4-5shot 557 33524 227284 wmt23.en-de.Lan-BridgeMT 557 28501 192144 wmt23.en-de.NLLB_Greedy 557 28117 188344 wmt23.en-de.NLLB_MBR_BLEU 557 34736 234666 wmt23.en-de.ONLINE-A 557 34410 236635 wmt23.en-de.ONLINE-B 557 34060 231497 wmt23.en-de.ONLINE-G 557 33365 226666 wmt23.en-de.ONLINE-M 557 34749 234086 wmt23.en-de.ONLINE-W 557 34450 235075 wmt23.en-de.ONLINE-Y 557 37148 249078 wmt23.en-de.ZengHuiMT 557 557 13278 wmt23.en-de.docid 557 557 1671 wmt23.en-de.origlang 557 34625 234302 wmt23.en-de.ref-refA 557 34711 197749 wmt23.en-de.src


Do you know my next steps for fixing this?

I can of course get around this by running `cat test.out | sacrebleu --tokenize 13a ~/.sacrebleu/wmt23/wmt23.en-de.ref-refA`, but I want to match the intended implementation exactly to reduce chances of error.
mjpost commented 7 months ago

I just released v2.4.2, which includes this bugfix, and also adds a domain field (available with --echo) for WMT22 and WMT23.