lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.37k stars 4.47k forks source link

OpenAI Proxy #2693

Open PSU3D0 opened 9 months ago

PSU3D0 commented 9 months ago

Hello all! Was curious if anyone has used FastChat as a proxy around OpenAI/Azure endpoints.

Use case:

We have multiple nodes all making calls to either Azure or OpenAI chat endpoints. We have two main problems.

If anyone has done this before with FastChat, or used something similar, I'd love to hear about it. Cheers!

chymian commented 9 months ago

I think liteLLM, which is a API-Middleware/Proxy, has rate-limiting implemented.

krrishdholakia commented 9 months ago

hey @PSU3D0, I'm the maintainer of LiteLLM we allow you to create a Router to maximize throughput by load balancing + queuing (beta).

I'd love to get your feedback if this solves your issue

Here's the quick start

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)