mjpost / sacrebleu

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Apache License 2.0
1.03k stars 162 forks source link

Use more linters, better #250

Closed dustalov closed 9 months ago

dustalov commented 9 months ago

This is a follow-up for mjpost/sacrebleu#249 that streamlines the build process, fixes type checking, and addresses linting issues.

dustalov commented 9 months ago

@martinpopel @mjpost please have a look.

dustalov commented 9 months ago

There was a network error that led to a failed build after my last commit. Could you please restart it?

martinpopel commented 9 months ago

I've restarted it and one of the checks fails with "error: invalid command 'bdist_wheel'"

dustalov commented 9 months ago

I guess it was caused by the missing package wheel, but the tests failed again due to the network errors. Could you restart them, please?

dustalov commented 9 months ago

It seems happened again. Could you please restart the tests?

dustalov commented 9 months ago

@martinpopel kind reminder

martinpopel commented 9 months ago

This is at least the second time when the check-build (macos-latest, 3.11) test fails with the same error:

sacreBLEU: Downloading http://data.statmt.org/wmt19/translation-task/test.tgz to /Users/runner/work/sacrebleu/sacrebleu/.sacrebleu/wmt19/raw/wmt19.test.tgz
sacreBLEU: Extracting /Users/runner/work/sacrebleu/sacrebleu/.sacrebleu/wmt19/raw/wmt19.test.tgz to /Users/runner/work/sacrebleu/sacrebleu/.sacrebleu/wmt19/raw
sacreBLEU: System and reference streams have different lengths.

This is just a warning, but the followup test fails because obtained is an empty string not matching the expected output, which is surely a result of the corrupted wmt19 test set. This test on other platforms is OK. I have no idea why this happens and what should we try except for omitting (macos-latest, 3.11) from the tests (other tests are cancelled once the first platform fails). Maybe changing the http to https? http://data.statmt.org/wmt19/translation-task/test.tgz returns HTTP 301 Moved Permanently Location: https://data.statmt.org/wmt19/translation-task/test.tgz

I don't think it makes any sense to rerun the tests again.

dustalov commented 9 months ago

The build fails on one specific example that works well on my machine (macOS 14.0, Python 3.10.13).

Testing python3 -m sacrebleu -t wmt18,wmt19 -l en-de --echo=src | python3 -m sacrebleu -t wmt18,wmt19 -l en-de -b --detail
sacreBLEU: Downloading http://data.statmt.org/wmt18/translation-task/test.tgz to /Users/<REDACTED>/sacrebleu/.sacrebleu/wmt18/raw/wmt18.test.tgz
sacreBLEU: Extracting /Users/<REDACTED>/sacrebleu/.sacrebleu/wmt18/raw/wmt18.test.tgz to /Users/<REDACTED>/sacrebleu/.sacrebleu/wmt18/raw
sacreBLEU: Downloading http://data.statmt.org/wmt19/translation-task/test.tgz to /Users/<REDACTED>/sacrebleu/.sacrebleu/wmt19/raw/wmt19.test.tgz
sacreBLEU: Extracting /Users/<REDACTED>/sacrebleu/.sacrebleu/wmt19/raw/wmt19.test.tgz to /Users/<REDACTED>/sacrebleu/.sacrebleu/wmt19/raw
PASS

Let's change it to HTTPS and add two more tests triggers: for push and for workflow dispatch.

dustalov commented 9 months ago

Let's see if it works with the code that is already in master. I submitted another PR, mjpost/sacrebleu#252, that simplifies the troubleshooting.

dustalov commented 9 months ago

This run was successful just because we were lucky. As long as we rely on a third party that serves the data, we cannot be sure it would work 100% of the time. Now we can merge this PR.

martinpopel commented 9 months ago

Thanks for all the work.

dustalov commented 9 months ago

Thank you for your patience! Would you mind rolling out a new release so I can update the dependency string in my code?

mjpost commented 9 months ago

Thanks for this, and sorry to be absent from the conversation.

I can do a new release. Would 2.3.3 be appropriate?

Can I ask you to create the release PR, and also to update the CHANGELOG to summarize the changes you've made?

dustalov commented 9 months ago

Yes.