hplt-project / sacremoses

Python port of Moses tokenizer, truecaser and normalizer
MIT License
486 stars 59 forks source link

Truecaser Test dependency on norvig.com/big.txt #49

Closed DavidHarrison closed 5 years ago

DavidHarrison commented 5 years ago

norvig.com is currently down which is causing the tests in sacremoses/test/test_truecaser.py to fail if big.txt has not already been downloaded. I'm wondering if there is another source of the file that might be more reliable or whether it could be included in the repository (e.g. with Git LFS).

alvations commented 5 years ago

Thanks @DavidHarrison for the suggestion! I wanted to keep the testing light so I left out the file rather than putting it in the repo with LFS.

I'll try to look out for better way(s) to test file in, file out functions. That'll be helpful for #37 too.

alvations commented 5 years ago

I've created a "mirror" of the https://norvig.com/big.txt on https://gist.githubusercontent.com/alvations/6e878bab0eda2624167aa7ec13fc3e94/raw/4fb3bac1da1ba7a172ff1936e96bee3bc8892931/big.txt

That should work as a backup =)

alvations commented 5 years ago

Resolved c.f. #59