mjpost / sacrebleu

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Apache License 2.0
1.03k stars 162 forks source link

TER asian support #229

Closed esalesky closed 1 year ago

esalesky commented 1 year ago

I am seeing scores that suggests --ter-asian-support is only applied if --ter-normalize is present. This isn't what I expected from the README, so posting this issue to be safe. This can be verified with wmt21 systems/ref packaged in sacrebleu.
Replicating behavior with sacrebleu version 2.3.1 is below.

Summary:

To replicate with the wmt21 system in the README (shortened output):

sacrebleu -t wmt21/systems -l en-zh --echo NiuTrans | sacrebleu -t wmt21/systems -l en-zh -m ter "name": "TER", "score": 101.3, |norm:no|asian:no|

sacrebleu -t wmt21/systems -l en-zh --echo NiuTrans | sacrebleu -t wmt21/systems -l en-zh -m ter --ter-asian-support "name": "TER", "score": 101.3, |norm:no|asian:yes|

sacrebleu -t wmt21/systems -l en-zh --echo NiuTrans | sacrebleu -t wmt21/systems -l en-zh -m ter --ter-normalize "name": "TER", "score": 101.7, |norm:yes|asian:no|

sacrebleu -t wmt21/systems -l en-zh --echo NiuTrans | sacrebleu -t wmt21/systems -l en-zh -m ter --ter-asian-support --ter-normalize "name": "TER", "score": 54.8, |norm:yes|asian:yes|

ozancaglayan commented 1 year ago

Hi,

Yes it seems that the README is wrong. Looking at the code, normalize is actually what enables tokenization and asian support/tokenization only kicks in if normalize is enabled https://github.com/mjpost/sacrebleu/blob/4f4124642c4eb0b7120e50119c669f0570a326a7/sacrebleu/tokenizers/tokenizer_ter.py#L150