pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.02k stars 43 forks source link

detect_multiple_languages_of crashes on Arabic #205

Closed tbarkai closed 7 months ago

tbarkai commented 7 months ago

The following command causes a panic exception:

>>> detector.detect_multiple_languages_of('صباغ الكتريك')
...
pyo3_runtime.PanicException: byte index 12 is not a char boundary; it is inside 'ل' (bytes 11..13) of `صباغ الكتريك`

This works:

>>> detector.detect_language_of('صباغ الكتريك')
Language.ARABIC
pemistahl commented 7 months ago

Hi @tbarkai, this bug has already been reported in #203. I've fixed it already and will release it when the other issues of the 2.0.2 milestone have been resolved.