optimaize / language-detector

Language Detection Library for Java
Apache License 2.0
568 stars 165 forks source link

Every Time It Returns only absent() #98

Open hattewarsm opened 5 years ago

stefan-reich commented 5 years ago

You might want to give some more info :smiley:

james-s-w-clark commented 4 years ago

@hattewarsm are you referring to something like:

        List<LanguageProfile> languageProfiles = new LanguageProfileReader().readAllBuiltIn();
        LanguageDetector detector = LanguageDetectorBuilder.create(NgramExtractors.standard())
                .withProfiles(languageProfiles)
                .build();

        Optional<LdLocale> detected = detector.detect("コンコルド001試作機は1969年3月2日にトゥールーズで初飛行した");

and detected has value Optional.absent()?

I tested a few more examples:

This detector requires the most confident language detected to have >= 0.9999 confidence. This does seem rather high. Confidence below this returns Optional.absent().

You may be better off using detector.getProbabilities and taking the most confident language (.get(0) - they're sorted).

If this isn't the case, I think you'd have to give more information for the ticket not to be rejected.