kwonoj / cld3-asm

WebAssembly based Javascript bindings for google Compact Language Detector v3
MIT License
56 stars 7 forks source link

Inaccurate results #75

Closed BaderMudarra closed 6 years ago

BaderMudarra commented 6 years ago

Hi I was randomly testing some text messages trying to detect the language using your assembly and got these results:

Like super duper sketchy Language is: da Probability: 0.9992335438728333

living in music loving art Language is: no Probability: 0.996842622756958

AMERICAN DIABETES ASSOCIATION ALERT DAY Language is: hu Probability: 0.26049503684043884

great late brunch in Lox five ways Language is: fy Probability: 0.878024160861969

Actually all of these are detected by cld3 as English and I don't why why it reported incorrect results

kwonoj commented 6 years ago

This package doesn't do any particular around language detection - it just transparently passes to cld3 binary. For detection result, while it is possible to lower minBytes than default value of identifier, in most cases it won't work correctly and detection would return incorrect result. I do understand your concern is different output than cld3 binary itself, but for those 2 reasons I don't think there's issue around. Check https://groups.google.com/forum/#!topic/cld3-users/ow8n_q7mn2o for min bytes recommendation.