Support request batching

yifanmai commented 1 year ago

OpenAI's API supports request batching. N users (as of 2023-03-07, N = 3) have expressed interest in batching on the CRFM proxy API to OpenAI.

Edit: users want to batch requests to other models such as Megatron too (as of 2022-05-03, N = 2)

Feedback:

That would be super useful. I've been using the OpenAI API previously. And its batching allows for much higher throughput.

RylanSchaeffer commented 1 year ago

N +=1 please!

yifanmai commented 1 year ago

This could help with Hugging Face local model inference performance by batching requests to the node.

ishaan-jaff commented 10 months ago

@yifanmai @RylanSchaeffer

I'm the maintainer of LiteLLM we provide an Open source proxy for load balancing Azure + OpenAI + Any LiteLLM supported LLM It can process (500+ requests/second)

From this thread it looks like you're trying to load balance between OpenAI instance - I hope our solution makes it easier for you. (i'd love feedback if you're trying to do this)

Here's the quick start:

Doc: https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model

Step 1 Create a Config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: 
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-4",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

yifanmai commented 10 months ago

Thanks @ishaan-jaff for the tips and doc references - I think we do want to integrate with LiteLLM at some point, so this is very useful.

ishaan-jaff commented 10 months ago

what's missing in LiteLLM to integrated today @yifanmai ?

yifanmai commented 10 months ago

@ishaan-jaff If you have spare cycles, it would be good to get HELM working end-to-end with LiteLLM, and document (or open a pull request for) the changes that need to be made. Also, there needs to be user documentation for this - your earlier comment is a good starting point.

I expect that the following may be issues:

The Azure OpenAI API format is subtly different from the default OpenAI API format.
We still use the deprecated engine field instead of model, which will probably not work with LiteLLM.

stanford-crfm / helm