Closed afaji closed 6 years ago
maybe sacrebleu failed, remove ~/.sacrebleu and try again.
Also I think the examples is downloading scacrebleu by itself, if you do things manually you are on your own :)
Files data/valid.{de,en}
are downloaded automatically by sacreBLEU in the run-me.sh
script and they should be 3k lines long. I tested that and the example works for me.
@afaji what about your data/test201?.en
files? Those are also downloaded by sacreBLEU.
I redo everything, remove ~/.sacrebleu, using the automatically downloaded sacrebleu as well.
it seems that I got UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 30: ordinal not in range(128) error while executing:
LC_ALL=C.UTF-8 ../tools/sacreBLEU/sacrebleu.py -t wmt13 -l en-de --echo src > data/valid.en
I'm running this on wilkes cluster. This apparently worked fine on valhalla.
LC_ALL=C.UTF-8 is supposed to fix that. Matt says that the accent which caused these issues has been removed with the newest version of sacrebleu.
Angus recommends PYTHONIOENCODING=utf-8
as working on shudder CentOS as well as Ubuntu.
What's the status of this?
I'm running ./run-me.sh in transformer example and it seems that the validation set is weirdly too small.
I'm using this sacreBLEU https://github.com/mjpost/sacreBLEU/tree/master