Closed dacorvo closed 2 months ago
This adds scripts to test TGI deployments using several TGI servers on the same host and a load-balancer to achieve Data Parallelism.
The test client is llmperf.
It also includes results for LLama 7b and Mistral v2 deployed on a inf2.48xlarge in a DP3 TP8 configuration.
What does this PR do?
This adds scripts to test TGI deployments using several TGI servers on the same host and a load-balancer to achieve Data Parallelism.
The test client is llmperf.
It also includes results for LLama 7b and Mistral v2 deployed on a inf2.48xlarge in a DP3 TP8 configuration.