tylergneill / skrutable

Toolkit for manipulating Sanskrit text with Python
Other
14 stars 3 forks source link

fix: punctuation interleaving bug #39

Closed tylergneill closed 2 weeks ago

tylergneill commented 2 weeks ago

Noticed that the recent upgrade to splitting, which ambitiously also extended the definition of punctuation to include things that could be before content like "[3.142]" introduced a bug that caused 500 errors when the number of units of "punctuation" (including newlines) was smaller than that of content — which of course occurs whenever someone inputs a newline-separated verse in two lines without any further punctuation (e.g., at the end).

Generalize capture and restoration of content and punctuation in order using special behavior of capturing groups ("(...)") with re.split, which does not discard the units split on.

Write first splitter tests for this to ensure proper behavior going forward.