nomnivore / ollama.nvim

A plugin for managing and integrating your ollama workflows in neovim.
MIT License
311 stars 22 forks source link

Chat is working, some printout issues #11

Closed gerazov closed 7 months ago

gerazov commented 7 months ago

I've implemented some basic chat here (#7 ):

ollama chat

The whole chat gets passed each time to Ollama, so the buffer can be modified as freely. Sometimes Ollama does start talking to itself :thinking: I've tried to prevent this with the initial setup. It also has some issues with new lines but all in all it works ok :sweat_smile:

The user can start a Chat session and chat using the create_chat() and chat() functions. I've mapped them like so:

  keys = {
    {
      "<leader>oc",
      ":<c-u>lua require('ollama').create_chat()<cr>",
      -- ":<c-u>Ollama<cr>",
      desc = "Create Ollama Chat",
      mode = { "n" },
      silent = true,
    },
    {
      "<leader>opc",
      ":<c-u>lua require('ollama').prompt('Chat')<cr>",
      -- ":<c-u>Ollama<cr>",
      desc = "Ollama Chat",
      mode = { "n" },
      silent = true,
    },
},
nomnivore commented 7 months ago

Interesting approach - I'm not sure I like sending the entire conversation text with each prompt though. I fear you could hit the token limits before long.

Ollama's HTTP API does send a set of context parameters along with the final token of a response, that can be forwarded with the next request to keep a 'conversation history'. It would need a bit of work, and likely out of scope for the current prompt->action framework, so my idea was that of an additional chat module of the plugin, maybe refactoring the API calls into a way that can be utilized by both prompts and chat, and using something like nui to help facilitate a more robust UI for chat.

I ramble, and I have more ideas, I just need to find the time. In any case, I don't think I want to merge this implementation of a chat interface. I do like and appreciate the work you've committed thus far - keeping a persistent history of conversations is a great idea!

gerazov commented 7 months ago

Yeah I guess there's a better way than sending the whole chat all the time. I'm not sure how it's done though - I don't think they keep the sessions up indefinetly to allow you to get back to old chats. So, they must send the saved chat at least initially to prime the LLM, and then keep the instance alive while the user is active. I'll look into the context thing to do this with Ollama..

From what I can see - context is only kept a very short time currently (I managed to get "another" by asking quickly again) and then the model is reinitialized if you delay asking another question (5s?).

I don't think the number of tokens is a limitation - you can split the chat into chunks.

Looking forward to a more decent implementation :rocket: till then I have something I can work with :wink:

nomnivore commented 7 months ago

The context params is the memory itself - as you said, the LLM itself doesn't save state. If you curl a request in the terminal yourself, you'll see it spit back a rather large array of numbers in the context field of the JSON response. Theoretically we could store that alongside the tempfiles (think like chat01_ctx.txt or something), then send that same number array in the JSON's context field in the body of our next POST request to the API. Using context prevents the LLM from getting confused by being sent the entire chat history as prompt, leading it to potentially talk to itself or have other side effects, and provides an experience similar to ollama run <model> in the terminal or web UIs like ChatGPT. At least, that is my understanding of it.

gerazov commented 6 months ago

I've moved this to a temporary repo ollama-chat.nvim so I can play with it further :wink:

Again looking forward to implementation of chat in ollama.nvim :rocket: