pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.02k stars 43 forks source link

Multiple Function result discrepancy #228

Open EvGe22 opened 1 month ago

EvGe22 commented 1 month ago

Given a text in Ukrainian, two methods provide two completely different results.

detector = LanguageDetectorBuilder.from_all_languages().build()
string = "Що найбільше подобається читачам у жанрі \"Фентезі\"?"

print(detector.compute_language_confidence_values(string))
>>> [ConfidenceValue(language=Language.KAZAKH, value=1), ConfidenceValue(language=Language.AFRIKAANS, value=0), ConfidenceValue(language=Language.ALBANIAN, value=0), ...] 

print(detector.detect_multiple_languages_of(string))
>>> [DetectionResult(start_index=0, end_index=51, word_count=7, language=Language.UKRAINIAN)]