machinewrapped / gpt-subtrans

Open Source project using LLMs to translate SRT subtitles
Other
311 stars 36 forks source link

Auto-split batches for retranslation #127

Open machinewrapped opened 4 months ago

machinewrapped commented 4 months ago

I've found that GPT is much more likely to desync with larger batches, and splitting the batch in two and retranslating almost always fixes the problem. It would make sense from a usability and economic perspective to do this automatically when batch validation fails.

Splitting the lines in two at the longest gap should be a reasonable approach (perhaps weighted according to proximity to the midpoint - splitting off one or two lines probably won't help).