hohieukn / language-detection

Automatically exported from code.google.com/p/language-detection
0 stars 0 forks source link

Characters not in the language appears in profile #71

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.Default profiles
2.Open profile with notepad
3.あ seen in Chinese profile

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?
The 2014/3/3 one

Please provide any additional information below.
Can add a mechanism to not include characters not in the language into profile, 
when creating profile. This may be able to reduce noises/wrong detection?

Add function to create profile from multiple sources/files, as everyday phrases 
may not be in wikipedia.

Combination of short message profiles and normal profiles?

Dictionary based/script type based detection if no feature found in text? Many 
times I get "no feature found in text" for short texts, with default profile. 
Output of probable language in tuples/2D arrays?

Original issue reported on code.google.com by dennis97...@gmail.com on 11 Aug 2014 at 5:01