weaviate / Verba

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
BSD 3-Clause "New" or "Revised" License
5.96k stars 636 forks source link

Add support for LiteLLM #56

Closed priamai closed 9 months ago

priamai commented 9 months ago

Hi there, I am getting familiar with the source code but I want to have the ability to change the settings of the embeddings and generation to an OpenAI PROXY server: https://docs.litellm.ai/docs/simple_proxy

We would just need 2 environment settings, the first one is the API KEY you already have it and the second one is the URL to point the OpenAI class.

openai.api_key = "anything"             # this can be anything, we set the key on the proxy
openai.api_base = "http://0.0.0.0:8000" # set api base to the proxy from step 1

Those are exactly the same as the library as you would use it, of course normally the api_base poins to Azure:

import openai

# optional; defaults to `os.environ['OPENAI_API_KEY']`
openai.api_key = '...'

# all client options can be configured just like the `OpenAI` instantiation counterpart
openai.base_url = "https://..."
openai.default_headers = {"x-foo": "true"}

Let me know. Cheers!

thomashacker commented 9 months ago

Great idea! We'll add this

ishaan-jaff commented 9 months ago

i'm the maintainer of LiteLLM, let me know if you run into any issues

thomashacker commented 9 months ago

@ishaan-jaff Thanks a lot! Just to clarify, we simply need two new environment variables to be able to use LiteLLM, correct? 😄

priamai commented 9 months ago

We would need also another UI dropbox, because with LiteLLM you can choose dynamically which backend you want to call, even though the interface is OpenAI but you can load for example a LLama2 model. I think having that in the UI will be better instead of an environment variable which will be annoying.

priamai commented 9 months ago

Example below:

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

But model can be anything defined by the user in his model list:

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]
thomashacker commented 9 months ago

Great, thanks for the info! We'll look into it

priamai commented 9 months ago

Great, thanks for the info! We'll look into it

I suggest the best approach to be the following: a) the user enable the LiteLLM proxy via an environment variable b) your backend calls the LiteLLM API (GET /models) to list the available models c) the front UI has a selection box to select the models available.

API:

Server Endpoints POST /chat/completions - chat completions endpoint to call 100+ LLMs POST /completions - completions endpoint POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints GET /models - available models on server POST /key/generate - generate a key to access the proxy

ishaan-jaff commented 9 months ago

@ishaan-jaff Thanks a lot! Just to clarify, we simply need two new environment variables to be able to use LiteLLM, correct?

One variable* You just need to set the api_base to the litellm proxy doc: https://docs.litellm.ai/docs/proxy/quick_start#using-with-openai-compatible-projects

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:8000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)
thomashacker commented 9 months ago

Thanks everyone! We added the environment variable for the BASE_URL proxy, if set, the OpenAI Generator will use the proxy. We want to make changes on the UI and improve how users interact with environment variables in the future, but not right now unfortunately. 🚀

tan-yong-sheng commented 1 month ago

Thanks everyone! We added the environment variable for the BASE_URL proxy, if set, the OpenAI Generator will use the proxy. We want to make changes on the UI and improve how users interact with environment variables in the future, but not right now unfortunately. 🚀

Hi @thomashacker, thanks for this wonderful feature. Wanna check with you if the embedding models in LiteLLM are supported? Thanks a lot!