iotayo / aivivn-tone

Submission for AIviVN Vietnamese diacritics restoration contest https://www.aivivn.com/contests/3
MIT License
38 stars 14 forks source link

N-gram model does not exist #2

Closed demdecuong closed 4 years ago

demdecuong commented 4 years ago

https://drive.google.com/file/d/14RmQSYgijeSVzZNZ2mPGL0lCLg_guXGE/view?usp=sharing

iotayo commented 4 years ago

Hi demdecuong,

The 4-gram language model is too large to keep in my personal Google Drive, so I pulled it out 3 months after my submission.

Please consider building one yourself, using the CSV V2 data from this repo https://github.com/binhvq/news-corpus, and follow the instructions from kenlm's website https://kheafield.com/code/kenlm/estimation/

Thank you for your understanding.