tylergneill / skrutable

Toolkit for manipulating Sanskrit text with Python
Other
14 stars 3 forks source link

feat: preserve compound hyphens #37

Closed tylergneill closed 3 weeks ago

tylergneill commented 3 weeks ago

Context: Skrutable had been using only the unsandhied Dharmamitra tagging mode option. But the more complex morphosyntax option distinguishes compounds, and it even includes a handy hyphen in the unsandhied output.

Take advantage of this second mode to create an option to preserve hyphens in Skrutable, which will be the new default.

Downside: max_char_limit is much smaller with this more complex mode option, empirically something like 150 rather than 350 with unsandhied. This might occasionally introduce additional errors during enforcement of this limit. In fact, unsandhied-morphosyntax would be a more appropriate mode option to uyse, because then the output needn't include the useless lemma info that takes up space in the output window. However, tests showed that this unsandhied-morphosyntax mode is not currently working, for whatever reason. If it starts working again, consider switching over and raising max_char_limit accordingly.