pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.08k stars 44 forks source link

Proposition: Add confidence value to the output of method Detector.detect_language_of #120

Closed erangold closed 1 year ago

erangold commented 1 year ago

Hi, is it possible to add the confidence value to the output of the method Detector.detect_language_of(text)? Currently I'm obtaining the confidence (assuming the returned language is not None) by additionally calling the method Detector.compute_language_confidence(text, language),even though the confidence is already computed by the previous method.

pemistahl commented 1 year ago

Simply use the method compute_language_confidence_values() which returns both the language and the confidence score for every language supported by your LanguageDetector instance. If you are only interested in the most likely language, just take the first value of the list.

from lingua import LanguageDetectorBuilder

detector = LanguageDetectorBuilder.from_all_languages().with_preloaded_language_models().build()
most_likely_language, confidence = detector.compute_language_confidence_values("some text")[0]

print(most_likely_language, confidence)
# Language.ENGLISH 0.15831789027429602