wooorm / franc

Natural language detection
https://wooorm.com/franc/
MIT License
4.07k stars 175 forks source link

Failing in Italian #47

Closed albandaft closed 7 years ago

albandaft commented 7 years ago

While writing italian it gives it as less probable screen shot 2017-03-10 at 11 49 10

wooorm commented 7 years ago

https://github.com/wooorm/franc#whats-not-so-cool-about-franc

albandaft commented 7 years ago

Yes but this is not a solution. I am pretty sure the underneath algorithm can be improved to have the best match of probability on the top. Other use the practical use becomes useless

r1cebank commented 7 years ago

Use this if you are dealing with practical usage: https://www.npmjs.com/package/cld This library even recognize english as sot, and I have passed a large document.

solyarisoftware commented 4 years ago

I experienced @albandaft issues. See simple test program results:

 const franc = require('franc-min')

  if (process.argv.length === 2) {
     console.error(`usage: node ${process.argv[1]} <sentence, in double quotes>`)
     process.exit(1)
   }

const sentence = process.argv.slice(2).join(' ')
console.log( franc.all(sentence) )
$ node languageDetection "Questa è una mela."
 [
  [ 'spa', 1 ],
  [ 'ita', 0.9804634257155839 ],
  [ 'swe', 0.7473875511131304 ],
  ...

$ node languageDetection "Questa è una mela marcia. Perchè dovrei mangiarla?"
 [
  [ 'spa', 1 ],
  [ 'zlm', 0.9975283213182287 ],
  [ 'ita', 0.984552008238929 ],
  [ 'por', 0.9789907312049434 ],
  ...

$ node languageDetection "Questa è una mela marcia. Perchè dovrei mangiarla? A me le mele piacciono, ma non quelle marcie."
 [
  [ 'ita', 1 ],
  [ 'spa', 0.9301719221054348 ],
  [ 'por', 0.9287867677014585 ],
  [ 'fra', 0.9051576631630408 ],
  ...

It seems that increasing the number of words, the function start to recognize Italian language correctly. So I reckon that func() need a long sentence to guess correctly ( https://github.com/wooorm/franc#whats-not-so-cool-about-franc ). My problem is that i would like to run franc() in a chatbot, to filter out not Italian words/sentences.

My final question is:

how many words the function need to guess correctly?

Thanks giorgio

wooorm commented 4 years ago

Hi Giorgio!

Franc guesses languages. It is never 100% certain. The longer the text you pass in, the better the probability.

In all your cases, franc gives a 98% probability that they are Italian, you could use that.