jackschedel / KoalaClient

The best LLM API Playground Interface (for me)
https://client.koaladev.io/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

Token length bugs? #89

Closed cheeseonamonkey closed 6 months ago

cheeseonamonkey commented 8 months ago

Is anybody having bugs like:

This is annoying and I will take a look at it soon :yum:

ghost commented 8 months ago

That's because max context can't be higher than the max tokens, and generally you want to always have it set at the max the model allows you to, unless you're concerned about API costs.

So, for example, with 8192 context (max for GPT-4) and 1000 max tokens you should send (through the API) no more than 8192-1000-(some formatting token count) tokens, because the model can only generate the response as long as it fits its context window. Koala handles those calculations really well and stays true to the normal API usage.

jackschedel commented 8 months ago

@Yardanico's explanation is pretty good here. It might be worth reworking how the max tokens and max context settings to limit (max_tokens + max_context) <= (model_max_tokens), but the current implementation isn't necessarily incorrect, and if anything gives the user more freedom but the use-case for allowing (max_tokens + max_context) > (model_max_tokens) is incredibly niche, and I think its worth changing.