pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.08k stars 44 forks source link

Multiple Languages #154

Closed kazuser closed 11 months ago

kazuser commented 1 year ago

Hi! Thanks a lot for your "lingua"!

Could you please test it:

English language Английский язык

and

English language - Английский язык

?

lingua

My code is:

from lingua import Language, LanguageDetectorBuilder
languages = [Language.ENGLISH, Language.RUSSIAN]
detector = LanguageDetectorBuilder.from_languages(*languages).build()
sentence = '%text_from_memo%'
for result in detector.detect_multiple_languages_of(sentence): print(f"{result.language.name} {sentence[result.start_index:result.end_index]}")

But I'm on Delphi 11 now (+ Python 3.10.9), so I'm not sure who is the source of the problem :)

kazuser commented 1 year ago

And another (empty) one:

from lingua import Language, LanguageDetectorBuilder
languages = [Language.ENGLISH, Language.KAZAKH, Language.RUSSIAN]
detector = LanguageDetectorBuilder.from_languages(*languages).build()
sentence = 'V төзімділік спорт'
for result in detector.detect_multiple_languages_of(sentence):
  print(f"{result.language.name} {sentence[result.start_index:result.end_index]}")

empty

kazuser commented 1 year ago

Maybe something is wrong with the order? 😕

order

pemistahl commented 11 months ago

@kazuser I think I've fixed the underlying problem now. Will be part of the next release.

pchr8 commented 11 months ago

For completeness - I can reproduce this with

"Das ist mein Text, mit lange deutsche Wörter. Here is my text, it's clearly english text. Ось це мій текст.\n\n-\n\n\tSome more text.\n\nStop processing here - \n\n\nEND OF TEXT."

Stops at the - in 1.3.2, but works nicely in the 2-days-old 1.3.3!