Closed a-rbts closed 2 months ago
Hello, thanks for the interest, please allow me to answer your questions from a personal perspective.
I usually run two different models code
and instruct
or base
and instruct
because they give better results as the models are trained for specific tasks. I think you might be able to only run codellama:7b
but depending on the api it might perform bad on one or the other task (chat, fim) for a number of reasons sometimes because the prompt templates differ or are automatically formatted by some providers.
The ollama settings in the settings menu only point towards ollma so that I can fetch the models from the api, that is really its only purpose. Providers should be added under the menu here.
This UI allows you to setup different providers and you can switch between them in the model selection chat interface.
Hope that helps!
Great thanks for the explanation I totally missed the providers menu, but this is what I was looking for. On the other hand, it doesn't seem to work well with llama.cpp. The chat doesn't work if the provider field is set to be "lamacpp" but it works perfectly when selecting "oobabooga" (but still using llamacpp server as the api provider). It seems it has to do with the message format. Not sure why there is anything different between both providers in the configuration, it seems to be incorrect. On the other hand, I could not get FIM work get with either llamacpp or oobabooga. llamacpp receives requests but doesn't answer anything as queries seem to be malformed (using deepseek coder base). With oobabooga, the server doesn't seem to receive requests at all even with the right provider and port selected. I will be able to investigate this now that I am able to get something, so closing the issue.
Adding more information here:
FIM is broken for some requests with deepseek coder but this seems to be due to this bug in llama.cpp rather than a problem with twinny. I suspect most of the supported backends like ollama would fail too since they rely on llama.cpp. A mitigation is to re-quantize the model with a recent version of llama.cpp.
Chat with llama.cpp backend does not work when selecting llama.cpp as provider, but works when selecting (for example ) oobabooga. Since it relies on the OpenAI standard, I am not sure why providers should behave differently (apart from maybe setting up default api paths), but they seem to do and it seems that's incorrect for the llama.cpp backend.
Hope it helps.
Greetings, and thanks for your hard work! I am trying to setup the extension properly as instructed in the README.md, but it does not seem the UI matches what's described there. I am running llama.cpp, server, which offers an OpenAI compliant api.
When I open the side panel and chooses the configuration, there is no
Api Provider
, instead, there are just fields for theOllama Hostname
andOllama API Port
, but I am not using ollama (screenshot below). How/Where can we select llama.cpp as per instructions?