pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1k stars 43 forks source link

Luxembourgish #155

Open astuanax opened 11 months ago

astuanax commented 11 months ago

Would it be possible to include Luxembourgish?

I believe 2 EU langauges are missing from the list: Maltese and Luxembourgish.

It seems Thierry Goeckel already build luxdetection, but maybe we can integrate this? https://github.com/rotzbouw/luxdetect

Would be happy to discuss how to go forward and help out.

pemistahl commented 11 months ago

Hi @astuanax, thanks for your request.

I'm planning to add 25 more languages to Lingua so that it supports a total of 100 languages then. I'm pretty sure that Maltese and Luxembourgish will be among those new languages. It may take a while, however.

Before starting that, I will first evaluate whether it's possible to use the Rust port of Lingua within Python because the pure Python port is actually very slow. The Rust port is significantly faster.

astuanax commented 11 months ago

Sure, I understand, let me know if I can help with testing.

TomLucidor commented 8 months ago

@pemistahl can ML libraries accelerate Python's performance?

pemistahl commented 8 months ago

@TomLucidor I'm currently writing Python bindings for the Rust implementation which will eventually replace the pure Python implementation. This will solve most performance issues.

Mejans commented 8 months ago

Hello @pemistahl will there be Occitan and Kabyle languages in your 100 new supported languages? Best regards

pemistahl commented 8 months ago

Hi @Mejans, I won't add a set of 100 new languages. I was talking about 25 new languages. That's far enough work for now.

I haven't decided yet which languages to include but I'm in favor of including some minority languages as well. So thank you for proposing Occitan and Kabyle. I will keep them in mind.