Connecting to VLLM OpenAI API Compatible Server

weaviate / Verba

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate

BSD 3-Clause "New" or "Revised" License

6.34k stars 684 forks source link

Connecting to VLLM OpenAI API Compatible Server #123

Closed JoshuaFurman closed 2 months ago

JoshuaFurman commented 8 months ago

In my lab environment i am serving mixtral with VLLM using their OpenAI API compatible server and I'm hosting a weaviate instance as well.

I just spun up verba, pointing to both my weaviate instance and VLLM instance using the .env file and connection to weaviate seems to be all good. I can see my schema and object count in the status tab but any queries i make seem to break... Unsure if this is a limitation on being able to handle models other than GPT-3.5 || GPT-4 being served from OpenAI.

Has anyone been able to configure a setup like this?

Thanks!

JoshuaFurman commented 8 months ago

Looks as though you are boxed in to use either ADA embedding from OpenAI or MiniLM or Cohere as well...

thomashacker commented 7 months ago

Yes, right now there is no support for Mixtral models! But great point, we'll look into that for the next update

samos123 commented 6 months ago

I got this working end to end but had to make some changes to be able to use my custom embedding model server. I submitted a PR for the changes I needed to be able to use OpenAI compatible API server for both embeddings and LLM: https://github.com/weaviate/Verba/pull/148

I plan to publish an end to end tutorial that runs on K8s to install Verba, Weaviate, an LLM and an embedding model server all within the same K8s cluster. Stay tuned!

samos123 commented 6 months ago

I finished writing my guide for end-to-end private Verba RAG using Weaviate, Lingo, vLLM + Mistral 7b v2 and Sentence Transformers: https://www.substratus.ai/blog/lingo-weaviate-private-rag

Looking forward to hearing feedback. The guide should help you with figuring out how to use vanilla vLLM with Verba too.