Context length maxxing out at 4096 with Llama 3 models

nickdavis commented 4 months ago

Hey, thanks for sharing such a great tool

I might be missing something, but when I'm chatting with a Llama 3 model (either the original or a variant like dolphin 2.9) the context length seems maxxed out at 4096 and there doesn't seem to be an option to increase to 8k or higher (there are some dolphin variantw which can go up to 256k) for example

Is this something that would need fixing / updating in the UI in a future update or should I be handling this myself in Ollama?

To be clear the context length slider in the UI maxxes out at 4096 when workiing with one of these models.

Thanks!

faraday commented 3 months ago

There is slight problem with initialization of context length value in Chat Settings UI for models providing such long context support.

When you adjust by hand through this UI, it works fine:

There's a question that might arise about UX of long context models @mckaywrigley Most users generally want to use maximum capability, including long context (even 1M). Yet, for many queries, setting very large context lengths might not be good for cost.

I think the best way to handle this would be to react according to the prompt size, adding a kind of simple, automatic context size increase - for the session at hand.

Or it might be simpler to just alert the user, upon selecting "Gemini Flash" (long context models):

"Your context size can be set to maximum (1M). Are you sure?"

faraday commented 3 months ago

I think @mckaywrigley didn't want to blindly max out the context length setting to not incur unpredictable costs on user's end.

mckaywrigley / chatbot-ui

Context length maxxing out at 4096 with Llama 3 models #1696