pemistahl / lingua-go

The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Apache License 2.0
1.16k stars 65 forks source link

Support absolute language confidence metric #54

Open warvyvr opened 8 months ago

warvyvr commented 8 months ago

Hi, In my scenario, the goal is to detect whether the input text is in English or another language. I'm not sure how to utilize the library to accomplish this task. For instance, if the input text is in a specified language, such as Vietnamese, I expect the detection as non english

    languages := []lingua.Language{
        lingua.English,
        lingua.Vietnamese,
        lingua.Unknown,
    }

    sentence := "Thông tin tài khoản của bạn"

    detector := lingua.NewLanguageDetectorBuilder().
        FromLanguages(languages...).
        WithMinimumRelativeDistance(0.9).
        Build()

    confidenceValues := detector.ComputeLanguageConfidenceValues(sentence)

    for _, elem := range confidenceValues {
        fmt.Printf("%s: %.2f\n", elem.Language(), elem.Value())
    }

output:

Vietnamese: 1.00
English: 0.00

when remove lingua.Vietnamese from expected language list, the program outputs English: 1.00, I would like the result is other language type rather than engilsh. please help me on how to do this. Thanks in advance.

pemistahl commented 8 months ago

Hi, what you want is not yet possible with my library. As of now, it only provides a relative confidence metric that tells you how likely a language is in comparison to another language. What you want is an absolute confidence metric that works independently from any other language. I plan to implement something like that but it's not easy. I can't tell you when this will be done.

warvyvr commented 8 months ago

Hi, what you want is not yet possible with my library. As of now, it only provides a relative confidence metric that tells you how likely a language is in comparison to another language. What you want is an absolute confidence metric that works independently from any other language. I plan to implement something like that but it's not easy. I can't tell you when this will be done.

Thanks, it is a good news, look forward to it.

therealaditigupta commented 4 months ago

Looking forward to this feature! We are looking for something similar. Any update on when this may be available?