N-gram model does not exist

iotayo / aivivn-tone

Submission for AIviVN Vietnamese diacritics restoration contest https://www.aivivn.com/contests/3

MIT License

38 stars 14 forks source link

Closed demdecuong closed 4 years ago

demdecuong commented 4 years ago

iotayo commented 4 years ago

Hi demdecuong,

The 4-gram language model is too large to keep in my personal Google Drive, so I pulled it out 3 months after my submission.

Please consider building one yourself, using the CSV V2 data from this repo https://github.com/binhvq/news-corpus, and follow the instructions from kenlm's website https://kheafield.com/code/kenlm/estimation/

Thank you for your understanding.