tylergneill / skrutable

Toolkit for manipulating Sanskrit text with Python
Other
14 stars 3 forks source link

refactor: clean up and generalize splitter length and punctuation preservation #34

Closed tylergneill closed 1 month ago

tylergneill commented 1 month ago

In anticipation of adding a second option which doesn't share the same respect for special characters, it's necessary to not rely on these for doing pre-processing for length and punctuation preservation. Therefore, refactor with a more elegant approach, starting with parallel extraction of content and punctuation.

Other small improvements:

This also finally cleans up the vestiges of the splitter server from the skrutable wrapper.