Closed zhentaoyu closed 5 months ago
we may add some recommend config for benchmark to reach max throughputs like instance and batch size
we may add some recommend config for benchmark to reach max throughputs like instance and batch size
It depends. We can maintain a table after we do more experiments on different machines (SPR, client, generation ways, first token length, etc.)
Type of Change
feature or bug fix or documentation or others API changed or not
Description
detail description Issues: xxx
ret
whenignore_prompt
Expected Behavior & Potential Risk
the expected behavior that triggered by this PR
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed