machinewrapped / gpt-subtrans

Open Source project using LLMs to translate SRT subtitles
Other
347 stars 40 forks source link

Pre-process subtitles to split long lines #120

Closed machinewrapped closed 5 months ago

machinewrapped commented 7 months ago

Subtitles with long lines tend to cause GPT desyncs, as well as producing translations that are too long. To avoid these problems it would be helpful to pre-process the source file looking for long lines (by duration?) and split them at newlines, periods or commas.

Splitting the duration proportional to the length of the subtitle tends to produce pretty good results in Subtitle Edit, subject to a minimum duration. This functionality could be added to NewProjectSettings prior to the batch preview (maybe even incorporated into SubtitleBatcher), so that the resulting batches would still respect min/max line counts.

machinewrapped commented 5 months ago

Done