Open npalaska opened 1 month ago
@njhill would be good to have your thoughts on this
I think it's a good option to have in the toolbox for throughput-maximization experimentation. A wrapper client could be used which just wraps two different clients configured with different endpoints.
This seems like a pretty normal load balancer use case?
Currently, SDG only supports a single OpenAI endpoint. However, adding support for multiple OpenAI endpoints could significantly improve overall SDG performance. We have observed nearly a 50% improvement in total SDG timing by running two replicas of the vLLM server instead of one and load balancing them internally.
Consider the following scenarios: Scenario 1
Scenario 2
Running SDG with Scenario 1 showed nearly 50% improvement over Scenario 2. If SDG can work with multiple replicas of vLLM, we can incorporate Scenario 1 for better performance.