umarbutler / semchunk

A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.
MIT License
154 stars 9 forks source link

Divide by zero error #4

Closed jcobol closed 4 months ago

jcobol commented 4 months ago

It is possible for a token counter to return zero tokens, leading to a division by zero error in the merge_splits function:

  File "C:\Users\user\miniconda3\envs\project\Lib\site-packages\semchunk\semchunk.py", line 78, in merge_splits
    average = cumulative_lengths[midpoint] / tokens if cumulative_lengths[midpoint] else average
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
ZeroDivisionError: division by zero
umarbutler commented 4 months ago

Thanks for this! This bug is now fixed in v0.3.2 :)