nltk / nltk_data

NLTK Data
1.45k stars 1.04k forks source link

Add Malayalam language for PunktSentenceTokenizer() #144

Closed sabiqueqb closed 2 years ago

sabiqueqb commented 4 years ago

This commit adds Malayalam language support in PunktSentenceTokenizer(). Model was trained on Malayalam Wikipedia.

sabiqueqb commented 4 years ago

@stevenbird Please review

jerinphilip commented 4 years ago

@sabiqueqb How well does this work? Can you post a few examples? What did you train this on?