pemistahl / lingua

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Apache License 2.0
706 stars 63 forks source link

Option: Other #158

Closed thsm-kb closed 1 month ago

thsm-kb commented 1 year ago

Great tool - thank you! Suggestion: The possibility to add OTHER as a language. Lets say I want to find English and French in a multi-language set. I want to add English and French to LanguageDetectorBuilder.from_languages, but if the probability is low, I don't want everything to be marked as English or French, but something else -> Other.

pemistahl commented 1 year ago

Thanks for the suggestion @thsm-kb. In fact, I was already thinking about changing the detector's behavior in this way. I will most probably implement something like this. So please stay tuned.

thsm-kb commented 1 year ago

Any updates on this @pemistahl ? I'm currently searching for something to inspect if a website is written in Danish or not. I would love to use Lingua since it was a great experience last I used it.

pemistahl commented 1 year ago

No, there is no progress yet. I maintain four implementations of Lingua, currently I'm writing Python bindings for the Rust implementation. If there is progress, you will see it in the release notes.

pemistahl commented 1 month ago

Closed in favor of #214.