shuyo / language-detection

This is a language detection library implemented in plain Java. (aliases: language identification, language guessing)
https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md
732 stars 184 forks source link

Japanese language detection problem #36

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Download attached Japanese text
2. Execute in CMD.exe: "java -jar langdetect.jar --detectlang -d profiles 
lang_detect.txt"

What is the expected output? [ja:0.7142823122662098]
What do you see instead? lang_detect.txt:[en:0.7142823122662098, 
pl:0.14285727552109861, tl:0.14285682309
334474]

What version of the product are you using? latest (langdetect-09-13-2011)
On what operating system? Windows 7

Original issue reported on code.google.com by vova.pri...@gmail.com on 20 Apr 2012 at 7:48

Attachments:

GoogleCodeExporter commented 9 years ago
Using attached text lang_detect_1.txt 

What is the expected output? [ja:0.9999952022259697]
What do you see instead? [en:0.9999952022259697]

Original comment by vova.pri...@gmail.com on 20 Apr 2012 at 7:57

Attachments:

GoogleCodeExporter commented 9 years ago
Solution: attached documents are in ANSI encoding, need to save them to UTF8

Original comment by vova.pri...@gmail.com on 21 Apr 2012 at 3:21