pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.14k stars 45 forks source link

Error: ZeroDivisionError: float division by zero #102

Closed jordimas closed 1 year ago

jordimas commented 1 year ago

Hello.

When running this code with lingua_language_detector version 1.3.0.

with open('text.txt') as fh:
    text = fh.read()
    detector = LanguageDetectorBuilder.from_all_languages().build()
    print(text)
    result = detector.detect_language_of(text)
    print(result)

I get this error:

Traceback (most recent call last):
  File "/home/jordi/sc/crux-top-lists-catalan/bug.py", line 9, in <module>
    result = detector.detect_language_of(text)
  File "/home/jordi/.local/lib/python3.10/site-packages/lingua/detector.py", line 272, in detect_language_of
    confidence_values = self.compute_language_confidence_values(text)
  File "/home/jordi/.local/lib/python3.10/site-packages/lingua/detector.py", line 499, in compute_language_confidence_values
    normalized_probability = probability / denominator
ZeroDivisionError: float division by zero

I attached the text file that triggers the problem. It works fine with others texts. This happens often in a crawling application that I'm testing.

jordimas commented 1 year ago

text.txt

pemistahl commented 1 year ago

Thank you @jordimas for using my library and for reporting this bug. It is fixed now in Lingua 1.3.1. The cause of the ZeroDivisionError was an internal numerical underflow of probabilities for long texts. Switching from floats to Decimals in the right spot fixed it.

jordimas commented 1 year ago

Thanks so much for the quick fix!