Closed helmling closed 9 months ago
Huh - I didn't think anyone was using that. Sorry about that! I don't want to have multiple ways of running the same provider. Server handles gpu offloading better while the binary incurs a startup cost for every request, so in my view server is a better UX.
The binary is better for experimenting with different models / startup options - I'm open to adding this back if some more folks thumbs up this issue. If there aren't enough people using it I'd rather not incur the maintenance overhead of 2 standard llamacpp providers.
You're right though - starting the server is a bit annoying. I think we can add a helper that starts the server on first use and then shuts it down when you exit nvim.
If anyone wants to vote
Ok, don't worry, I can create my own provider using the old code. Found the change just a bit harsh :). In general I agree to what you're saying, also I did not see any startup or gpu offloading overhead (was running on a M2). In any case, thanks a lot for this nifty plugin and feel free to close this issue.
Sorry again about that, I generally try to deprecate first rather than change outright. It seemed like most folks were just interested in the server example rather than running through main. I'm curious about how you're using the llama.cpp binary - are you using different models or startup options across prompts, or do you just want to skip the startup step?
closing in favor of #22 - if anyone wants the ./main
provider back feel free to open a new issue
The llamacpp provider was changed to to use a local llama.cpp server instead of the llama.cpp binary (main). This is breaking all prompts using the llamacpp provider. Is there any special reason for replacing the current llamacpp with the new one? From my point of view this change results in a way less smooth user experience as I now always have start the server manually. Ideally we would have two providers: