machinewrapped / gpt-subtrans

Open Source project using LLMs to translate SRT subtitles
Other
311 stars 36 forks source link

Hit API token limit, retrying batch without context... #84

Closed FivespeedDoc closed 7 months ago

FivespeedDoc commented 7 months ago

Error translating batch: <PySubtitleGPT.ChatGPTTranslation.ChatGPTTranslation object at 0x000001B9DE6FD390> Two scenes, totalling around 1800 lines. Batch size 100lines/159lines, all not working Is it because this is too long?

machinewrapped commented 7 months ago

Hi, most likely yes. It depends which model you're using and how long the lines are. As a rule of thumb, gpt-3.5-turbo can handle batches of around 40 lines and gpt-3.5-turbo-16k can handle batches of 100-120 lines.

gpt-3.5-turbo-16k with a max batch size of 100 lines is my default setting these days, and I usually adjust the scene threshold to get around 10 scenes.

FivespeedDoc commented 7 months ago

Hi, most likely yes. It depends which model you're using and how long the lines are. As a rule of thumb, gpt-3.5-turbo can handle batches of around 40 lines and gpt-3.5-turbo-16k can handle batches of 100-120 lines.

gpt-3.5-turbo-16k with a max batch size of 100 lines is my default setting these days, and I usually adjust the scene threshold to get around 10 scenes.

I'm using GPT-4-turbo I presume it is 128k And recommended settings? Cause my video is pretty darn long, like near an hour, so I want efficiency

machinewrapped commented 7 months ago

I haven't experimented with GPT-4 Turbo much yet - it has a 128k context window but a limit of 4096 output tokens, so the larger context window isn't particularly useful for translation since the output is generally larger than the input (since it includes the original lines, which helps GPT stay in sync). The 16k 3.5 model can handle larger batches (I believe the 16k is shared between input and output).

I think I concluded about 70-80 lines was the limit for GPT-4, but the difference between 4 and 3..5 for translation is usually small (and not always positive) whereas the cost is much higher, so I usually use 3.5T. I'd recommend starting with gpt-3.5-turbo-16k and then maybe retranslate batches with GPT-4 if it got confused, such as mixing up who is speaking - GPT-4 is better at understanding context.

There's a trade off between batch size and number of batches when it comes to efficiency... larger batches are exponentially slower to translate, but since there's the overhead of the instructions sent with each batch there is a balancing point where fewer, larger batches is more efficient. Around 50-100 lines per batch gives a pretty good balance. Above 100 lines there is more chance of errors in the response that require a retry, which is self-defeating.

For the other settings, leave temperature at 0, and adjust max_characters if you find you're regularly getting "line too long" warnings that trigger retries (depending on the source, this can be a problem).

If you add a description for the project, keep it short (one or two sentences) otherwise GPT3.5 especially tends to start making up its own plot. A very short description does encourage it to produce batch and scene summaries though, which are useful for giving each batch context about what came before.

machinewrapped commented 7 months ago

I've updated the project wiki with more details on what the different settings do, and appropriate limits for each model: https://github.com/machinewrapped/gpt-subtrans/wiki