richtr / guessLanguage.js

A natural language detection library based on trigram statistical analysis for Node.js and the Web.
http://richtr.github.com/guessLanguage.js/
211 stars 39 forks source link

Norsk will never match #10

Closed wooorm closed 9 years ago

wooorm commented 9 years ago

Norsk (Norwegian) is listed in the trigram database as "nb" (Bokmål), but as "no" (Both Bokmål and Nynorsk) in the guessLanguage.js file.

A fix would be to rename the property in the trigram database ("nb" to "no"), but that might result in incorrect results, although the difference between Bokmål and Nynorsk is not that big for the trigram. A better fix would probably be (without changing the database) to classify as "nb" instead of "no".