stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
https://crfm.stanford.edu/helm
Apache License 2.0
1.88k stars 243 forks source link

Support request batching #1341

Open yifanmai opened 1 year ago

yifanmai commented 1 year ago

OpenAI's API supports request batching. N users (as of 2023-03-07, N = 3) have expressed interest in batching on the CRFM proxy API to OpenAI.

Edit: users want to batch requests to other models such as Megatron too (as of 2022-05-03, N = 2)

Feedback:

That would be super useful. I've been using the OpenAI API previously. And its batching allows for much higher throughput.

RylanSchaeffer commented 1 year ago

N +=1 please!

yifanmai commented 1 year ago

This could help with Hugging Face local model inference performance by batching requests to the node.

ishaan-jaff commented 10 months ago

@yifanmai @RylanSchaeffer

I'm the maintainer of LiteLLM we provide an Open source proxy for load balancing Azure + OpenAI + Any LiteLLM supported LLM It can process (500+ requests/second)

From this thread it looks like you're trying to load balance between OpenAI instance - I hope our solution makes it easier for you. (i'd love feedback if you're trying to do this)

Here's the quick start:

Doc: https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model

Step 1 Create a Config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: 
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-4",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'
yifanmai commented 10 months ago

Thanks @ishaan-jaff for the tips and doc references - I think we do want to integrate with LiteLLM at some point, so this is very useful.

ishaan-jaff commented 10 months ago

what's missing in LiteLLM to integrated today @yifanmai ?

yifanmai commented 10 months ago

@ishaan-jaff If you have spare cycles, it would be good to get HELM working end-to-end with LiteLLM, and document (or open a pull request for) the changes that need to be made. Also, there needs to be user documentation for this - your earlier comment is a good starting point.

I expect that the following may be issues: