Closed zeke closed 7 years ago
Hey sorry about that @zeke. I don't have much time now, so I'll try to respond more extensively later. Essentially: franc is good at many languages, which means it needs bigger input to get better results! 😞😐
No worries! Unfortunately these short strings are all I have.
I'm really just trying to answer the question, "Is this string in English?"
Do you know of any alternatives?
I guess I could look for each word in https://github.com/zeke/an-array-of-english-words, and if most of them are found, call it English. ¯\_(ツ)_/¯
Could you use franc.all and, when the English score is bigger than .95 (for example), call it English? Maybe that'll work?
This is a problem inherent to the algorithm: more languages means you need bigger documents for better guessing. I’ve noted that in the readme.
You could use franc-min
, this supports less languages, making the guessing better, if you’re only dealing with top-languages.
Finally, this problem sounds more like asserting that something is English. Franc solves a slightly different problem: out of all languages, which one is the most likely? To assert that something is probably English, I suggest using franc.all
and checking of eng
has a certainty of 0.9
or higher.
Thanks for following up. I ended up using https://github.com/dachev/node-cld which has a binary dependency but the results are very accurate.
Cool project! Yeah, there‘s definitely other algorithms better at smaller input!
Hey @wooorm am I doing something wrong here?