shuyo / language-detection

This is a language detection library implemented in plain Java. (aliases: language identification, language guessing)
https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md
732 stars 184 forks source link

Deterministic? #64

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run the detect function for the following text "me ha gustado un vídeo de 
de  youtu be/9t?a - 2 hack | black ops 2"

What is the expected output? What do you see instead?
I expected that the same input would generate the same output.

What version of the product are you using? On what operating system?
The last version, the one released at 03/03/2014

Please provide any additional information below.

If you run the detect 100 times it will give different results, like:

[es:0.9999965414321671]
[es:0.5714281577382894, en:0.428570266675543]
[es:0.7142834100667492, en:0.2857164026891309]
[en:0.5714263352517226, es:0.4285727706265049]

The last result would return English instead of Spanish

why is that?

Original issue reported on code.google.com by lfomen...@gmail.com on 31 Mar 2014 at 1:56

GoogleCodeExporter commented 9 years ago
I solved my problem. @shuyo answered me: 
"Sampling for denoising changes its result each time. If you want to make it 
determistic, set to seed at 0. 
https://code.google.com/p/language-detection/wiki/FrequentlyAskedQuestion"
It worked =D

Original comment by lfomen...@gmail.com on 7 Apr 2014 at 2:14

GoogleCodeExporter commented 9 years ago
Thank you for your comment here. I also got non-deterministic results.
Guess I missed this in the FAQ!
Thanks for pointing this out.

Stefan

Original comment by sfgo...@gmail.com on 27 Jun 2014 at 9:35