Closed gerazov closed 7 months ago
Interesting approach - I'm not sure I like sending the entire conversation text with each prompt though. I fear you could hit the token limits before long.
Ollama's HTTP API does send a set of context
parameters along with the final token of a response, that can be forwarded with the next request to keep a 'conversation history'. It would need a bit of work, and likely out of scope for the current prompt->action
framework, so my idea was that of an additional chat module of the plugin, maybe refactoring the API calls into a way that can be utilized by both prompts and chat, and using something like nui
to help facilitate a more robust UI for chat.
I ramble, and I have more ideas, I just need to find the time. In any case, I don't think I want to merge this implementation of a chat interface. I do like and appreciate the work you've committed thus far - keeping a persistent history of conversations is a great idea!
Yeah I guess there's a better way than sending the whole chat all the time. I'm not sure how it's done though - I don't think they keep the sessions up indefinetly to allow you to get back to old chats. So, they must send the saved chat at least initially to prime the LLM, and then keep the instance alive while the user is active. I'll look into the context thing to do this with Ollama..
From what I can see - context is only kept a very short time currently (I managed to get "another" by asking quickly again) and then the model is reinitialized if you delay asking another question (5s?).
I don't think the number of tokens is a limitation - you can split the chat into chunks.
Looking forward to a more decent implementation :rocket: till then I have something I can work with :wink:
The context params is the memory itself - as you said, the LLM itself doesn't save state. If you curl
a request in the terminal yourself, you'll see it spit back a rather large array of numbers in the context
field of the JSON response. Theoretically we could store that alongside the tempfiles (think like chat01_ctx.txt
or something), then send that same number array in the JSON's context
field in the body of our next POST request to the API.
Using context
prevents the LLM from getting confused by being sent the entire chat history as prompt, leading it to potentially talk to itself or have other side effects, and provides an experience similar to ollama run <model>
in the terminal or web UIs like ChatGPT.
At least, that is my understanding of it.
I've moved this to a temporary repo ollama-chat.nvim so I can play with it further :wink:
Again looking forward to implementation of chat in ollama.nvim :rocket:
I've implemented some basic chat here (#7 ):
The whole chat gets passed each time to Ollama, so the buffer can be modified as freely. Sometimes Ollama does start talking to itself :thinking: I've tried to prevent this with the initial setup. It also has some issues with new lines but all in all it works ok :sweat_smile:
The user can start a Chat session and chat using the
create_chat()
andchat()
functions. I've mapped them like so: