Closed adblancod closed 2 years ago
@adblancod can you share a snippet of code you used to get these? I tried with https://github.com/optimaize/language-detector/issues/86#issuecomment-638818158 and got:
detectedLanguages = {ArrayList@1320} size = 1
0 = {DetectedLanguage@1325} "DetectedLanguage[eu:0.7259662120805258]"
detectedLanguagesNormalised = {ArrayList@1321} size = 2
0 = {DetectedLanguage@1328} "DetectedLanguage[eu:0.8410933444100412]"
1 = {DetectedLanguage@1329} "DetectedLanguage[gl:0.13353105016279732]"
It looks like you're either manually normalising input, or using some Optimaize method which does normalization for you (which seems very important for CJK, but wasn't happening for me and another user in #86 ).
I don't think I can help with your accuracy though - perhaps the string is just too short for Optimaize. Google Translated detects Slovenian.
The following test:
BizteX cobitの使い方
is identified as Galician and Basque: