Open thiswillbeyourgithub opened 2 months ago
Good problem to know about, thanks. I'll consider this when updating to better support markdown generation.
Re: #56
Maybe a simple fix would be to first pass the text through pysbd instead of split_sentence. And only pass sentence that are longer than some limit to split_sentence.
I discovered pysbd trough another of your repos so am also curious about why you used it in some places but not this time.
I did have a version with pysbd instead, but found no major difference except that perhaps sentence_split was perhaps better for some languages. So why include the extra dependency? Anyways, I'm probably going to restore it after I look more deeply into this problem.
Hi,
I was just playing around with split_sentence and noticed that :
Given that I use markdown bullet points a lot, I often have line that end with no punctuation.
What do you think about automatically replacing newlines by a point if it's not already following a punctuation mark?
Also, there's no env variable to set the text length for the splitter right? I think lowering that would too reduce my VRAM need. Any opinion on this?