pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.14k stars 45 forks source link

Caught an IndexError while using detect_multiple_languages_of #98

Closed Saninsusanin closed 1 year ago

Saninsusanin commented 1 year ago

On the test_case:

, Ресторан «ТИНАТИН»

Code fell down with an error:

Traceback (most recent call last):
  File "/home/essential/PycharmProjects/pythonProject/test_unnest.py", line 363, in <module>
    for lang, sentence in detector.detect_multiple_languages_of(text)
  File "/home/essential/PycharmProjects/pythonProject/venv/lib/python3.10/site-packages/lingua/detector.py", line 389, in detect_multiple_languages_of
    _merge_adjacent_results(results, mergeable_result_indices)
  File "/home/essential/PycharmProjects/pythonProject/venv/lib/python3.10/site-packages/lingua/detector.py", line 114, in _merge_adjacent_results
    end_index=results[i + 1].end_index,
IndexError: list index out of range

Code example:

languages = [Language.ENGLISH, Language.RUSSIAN, Language.UKRAINIAN]
detector = LanguageDetectorBuilder.from_languages(*languages).build()
text = ', Ресторан «ТИНАТИН»'
sentences = [(lang, sentence) for lang, sentence in detector.detect_multiple_languages_of(text)]
pemistahl commented 1 year ago

Thank you @Saninsusanin for reporting this bug. I've just fixed it. Please update Lingua to version 1.2.1.