jerinphilip / ilmulti

Tooling to play around with multilingual machine translation for Indian Languages.
http://preon.iiit.ac.in/~jerin/bhasha
MIT License
21 stars 4 forks source link

Punkt integration #4

Closed jerinphilip closed 4 years ago

jerinphilip commented 4 years ago

Integrating a first implementation of a punkt-based tokenizer. The models are not very great, but I guess can be improved eventually. Definitely an improvement on the crude rule-based one which existed before.