xzycn / language-detection

Automatically exported from code.google.com/p/language-detection
0 stars 0 forks source link

Inconsistent results #59

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,

I'm currently discovering and testing your API and it's great!

However, I'm a bit surprised by the results I get. I noticed that for a 
multi-language (2) text, if I mix the sentences I get different scores, even 
though every single word/sentence is the same, just their order differs.

Here is the text I test and the result I get:

Wie geht es Ihnen? Es geht mir gut, Danke! Bonjour, le soleil brille et les 
oiseaux chantent.
fr:0.5714289303903483
de:0.42856987956922343

Wie geht es Ihnen? Bonjour, le soleil brille et les oiseaux chantent. Es geht 
mir gut, Danke!
fr:0.8571417085595199
de:0.14285781020198818

Bonjour, le soleil brille et les oiseaux chantent. Wie geht es Ihnen? Es geht 
mir gut, Danke!
fr:0.5714280769513275
de:0.4285703999762243

Could you explain to me what generates these inconsistences? If the detection 
is only based on the frequency of 1, 2 and 3 -letters patterns, then I don't 
understand this behaviour.

My version is langdetect-09-13-2011.zip and my operating system windows 7 
64-bit.

Thanks a lot for your explanation.

Original issue reported on code.google.com by iferra...@gmail.com on 30 Oct 2013 at 7:40

GoogleCodeExporter commented 9 years ago
It could be due to the fact that the algorithm is not deterministic. See 
https://code.google.com/p/language-detection/issues/detail?id=64.

Original comment by skr...@deezer.com on 22 May 2014 at 3:27