richtr / guessLanguage.js

A natural language detection library based on trigram statistical analysis for Node.js and the Web.
http://richtr.github.com/guessLanguage.js/
211 stars 39 forks source link

document trigram building process #12

Open danielnaber opened 9 years ago

danielnaber commented 9 years ago

It would be nice if the trigram building process was documented. For example, the German data contains strings like didiescheincheichdenin which doesn't look like a trigram. Also, is it known what was used as input originally?