nltk / nltk_contrib

NLTK Contrib
http://nltk.org/
Other
166 stars 137 forks source link

A language ID module using TextCat algorithm #16

Open avitalp opened 9 years ago

avitalp commented 9 years ago

A language ID module using TextCat algorithm using language n-grams from "An Crubadan" project. In response to https://github.com/nltk/nltk/issues/107 and using https://github.com/nltk/nltk/pull/845

The method "demo" refers to several sample files which I didn't include, as I was not sure where they should be placed.

@alexrudnick: would you be able to provide sample texts for some of the less well-represented languages?

stevenbird commented 9 years ago

Thanks @avitalp. I am considering putting this in nltk/classify.

avitalp commented 9 years ago

Thanks @stevenbird, that'd be great. Is there anything you'd like me to modify or add for that?