Closed albandaft closed 7 years ago
Yes but this is not a solution. I am pretty sure the underneath algorithm can be improved to have the best match of probability on the top. Other use the practical use becomes useless
Use this if you are dealing with practical usage: https://www.npmjs.com/package/cld This library even recognize english as sot, and I have passed a large document.
I experienced @albandaft issues. See simple test program results:
const franc = require('franc-min')
if (process.argv.length === 2) {
console.error(`usage: node ${process.argv[1]} <sentence, in double quotes>`)
process.exit(1)
}
const sentence = process.argv.slice(2).join(' ')
console.log( franc.all(sentence) )
$ node languageDetection "Questa è una mela."
[
[ 'spa', 1 ],
[ 'ita', 0.9804634257155839 ],
[ 'swe', 0.7473875511131304 ],
...
$ node languageDetection "Questa è una mela marcia. Perchè dovrei mangiarla?"
[
[ 'spa', 1 ],
[ 'zlm', 0.9975283213182287 ],
[ 'ita', 0.984552008238929 ],
[ 'por', 0.9789907312049434 ],
...
$ node languageDetection "Questa è una mela marcia. Perchè dovrei mangiarla? A me le mele piacciono, ma non quelle marcie."
[
[ 'ita', 1 ],
[ 'spa', 0.9301719221054348 ],
[ 'por', 0.9287867677014585 ],
[ 'fra', 0.9051576631630408 ],
...
It seems that increasing the number of words, the function start to recognize Italian language correctly. So I reckon that func() need a long sentence to guess correctly ( https://github.com/wooorm/franc#whats-not-so-cool-about-franc ). My problem is that i would like to run franc() in a chatbot, to filter out not Italian words/sentences.
My final question is:
how many words the function need to guess correctly?
Thanks giorgio
Hi Giorgio!
Franc guesses languages. It is never 100% certain. The longer the text you pass in, the better the probability.
In all your cases, franc gives a 98% probability that they are Italian, you could use that.
While writing italian it gives it as less probable