llamacpp provider change to use local server instead of local binary breaks existing prompts using llamacpp

gsuuon / model.nvim

Neovim plugin for interacting with LLM's and building editor integrated prompts.

MIT License

293 stars 21 forks source link

llamacpp provider change to use local server instead of local binary breaks existing prompts using llamacpp #17

Closed helmling closed 9 months ago

helmling commented 9 months ago

The llamacpp provider was changed to to use a local llama.cpp server instead of the llama.cpp binary (main). This is breaking all prompts using the llamacpp provider. Is there any special reason for replacing the current llamacpp with the new one? From my point of view this change results in a way less smooth user experience as I now always have start the server manually. Ideally we would have two providers:

llamacpp (the old one)
llamacpp-server (as the new one)

gsuuon commented 9 months ago

Huh - I didn't think anyone was using that. Sorry about that! I don't want to have multiple ways of running the same provider. Server handles gpu offloading better while the binary incurs a startup cost for every request, so in my view server is a better UX.

The binary is better for experimenting with different models / startup options - I'm open to adding this back if some more folks thumbs up this issue. If there aren't enough people using it I'd rather not incur the maintenance overhead of 2 standard llamacpp providers.

You're right though - starting the server is a bit annoying. I think we can add a helper that starts the server on first use and then shuts it down when you exit nvim.

If anyone wants to vote

👍 to re-add the llama.cpp ./main provider
🚀 to autostart llama.cpp server

helmling commented 9 months ago

Ok, don't worry, I can create my own provider using the old code. Found the change just a bit harsh :). In general I agree to what you're saying, also I did not see any startup or gpu offloading overhead (was running on a M2). In any case, thanks a lot for this nifty plugin and feel free to close this issue.

gsuuon commented 9 months ago

Sorry again about that, I generally try to deprecate first rather than change outright. It seemed like most folks were just interested in the server example rather than running through main. I'm curious about how you're using the llama.cpp binary - are you using different models or startup options across prompts, or do you just want to skip the startup step?

gsuuon commented 9 months ago

closing in favor of #22 - if anyone wants the ./main provider back feel free to open a new issue