pemistahl / lingua-go

The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Apache License 2.0
1.19k stars 66 forks source link

Support absolute language confidence metric #54

Closed warvyvr closed 1 month ago

warvyvr commented 10 months ago

Hi, In my scenario, the goal is to detect whether the input text is in English or another language. I'm not sure how to utilize the library to accomplish this task. For instance, if the input text is in a specified language, such as Vietnamese, I expect the detection as non english

    languages := []lingua.Language{
        lingua.English,
        lingua.Vietnamese,
        lingua.Unknown,
    }

    sentence := "Thông tin tài khoản của bạn"

    detector := lingua.NewLanguageDetectorBuilder().
        FromLanguages(languages...).
        WithMinimumRelativeDistance(0.9).
        Build()

    confidenceValues := detector.ComputeLanguageConfidenceValues(sentence)

    for _, elem := range confidenceValues {
        fmt.Printf("%s: %.2f\n", elem.Language(), elem.Value())
    }

output:

Vietnamese: 1.00
English: 0.00

when remove lingua.Vietnamese from expected language list, the program outputs English: 1.00, I would like the result is other language type rather than engilsh. please help me on how to do this. Thanks in advance.

pemistahl commented 10 months ago

Hi, what you want is not yet possible with my library. As of now, it only provides a relative confidence metric that tells you how likely a language is in comparison to another language. What you want is an absolute confidence metric that works independently from any other language. I plan to implement something like that but it's not easy. I can't tell you when this will be done.

warvyvr commented 10 months ago

Hi, what you want is not yet possible with my library. As of now, it only provides a relative confidence metric that tells you how likely a language is in comparison to another language. What you want is an absolute confidence metric that works independently from any other language. I plan to implement something like that but it's not easy. I can't tell you when this will be done.

Thanks, it is a good news, look forward to it.

therealaditigupta commented 6 months ago

Looking forward to this feature! We are looking for something similar. Any update on when this may be available?

pemistahl commented 1 month ago

Closed in favor of #68. The Rust implementation will contain this feature in version 1.7.0 to be released still in this year.