Allow to configure and use any OpenAI compatible api.

nilsherzig / LLocalSearch

LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a chain of LLMs to find the answer. The user can see the progress of the agents and the final answer. No OpenAI or Google API keys are needed.

Apache License 2.0

5.66k stars 361 forks source link

Allow to configure and use any OpenAI compatible api. #65

Open dillfrescott opened 7 months ago

dillfrescott commented 7 months ago

This would be useful to use apps other than ollama, as there are tons of backend apps and servers that are openai api compatible.

nilsherzig commented 7 months ago

and there are proxy servers which will turn any api into an openai compatible one

BarfingLemurs commented 7 months ago

I get an error nomic-embed-text is not found https://github.com/nilsherzig/LLocalSearch/issues/64 when subbing to an openai compatible api like these:

d0rc commented 7 months ago

was exactly my question - what is the suggested way of using custom model / agent

nilsherzig commented 7 months ago

The current WebUI gets its model list (model switcher in the top left) from the /models endpoint of the API. If you want to run a different model than the default one, you would just have to load the model onto ollama @d0rc.

nilsherzig commented 7 months ago

@BarfingLemurs the embeddings model (used to create embeddings from the website texts) is currently hard-coded (I'm open to make this a configuration option). Could you try loading it onto one of the tools you've mentioned and report back if that just works?

nilsherzig commented 7 months ago

You can now configure the embeddings model name using env vars

BarfingLemurs commented 7 months ago

I don't know if other backends come with any such additions, honestly. Does ollama come with a vector database engine this project is looking for? Could I install it separately? I have been using the exllama backend, which have fast prompt processing speeds.

Log: https://pastebin.com/uJgJRXz7 (TabbyAPI https://github.com/theroyallab/tabbyAPI)

(Exllamav2 https://github.com/turboderp/exllamav2)

nilsherzig commented 7 months ago

I don't know if other backends come with any such additions, honestly. Does ollama come with a vector database engine this project is looking for? Could I install it separately? I have been using the exllama backend, which have fast prompt processing speeds.

Log: https://pastebin.com/uJgJRXz7 (TabbyAPI https://github.com/theroyallab/tabbyAPI)

(Exllamav2 https://github.com/turboderp/exllamav2)

127.0.0.1 from the view of the backend is the backends loopback address not your hosts localhost. Your logs don't indicate that exllama would not work, but that the backend has no connection to the API.

Ollama does not provide the vector DB. It's a wrapper around llama.cpp with something like a package manager for pre configured LLMs.

Looks like exllamav2 is Nvidia only? I only have an AMD card :/.

BarfingLemurs commented 7 months ago

Its seems ollama supports this nomic embed text in some way, and I should deploy this but can use the main model from text-gen-webui for exllama.

text-gen-webui has many backends including llama.cpp, hence I think your card and cpu will be fine for testing. I made a mistake earlier using tabbyAPI which needs you to enter keys, text-gen-webui does not. However, I still didn't figure out how to run LLocalSearch yet.

https://imgur.com/a/xnJnDaN