m1guelpf / yt-whisper

Using OpenAI's Whisper to automatically generate YouTube subtitles
MIT License
1.36k stars 138 forks source link

feat: add option to break long lines into two #14

Closed nogarcia closed 2 years ago

nogarcia commented 2 years ago

Closes #1

Sorry for taking so long! The problem turned out to be a little more (or less) complicated than I thought.

The approach I take here is to get the middle of the string (so that the lines are roughly even) and decrement that slice until it hits a space. This works well but makes some choices I'll list in case you want to change them:

nogarcia commented 2 years ago

Done! Leading and ending spaces are now stripped from the segments, and line breaking is applied to both VTT and SRT outputs. To avoid duplicating some code, I moved the leading space removal and the line breaking into a method called process_segment, which is called in both write_vtt and write_srt.