Open LinGeLin opened 4 months ago
CC @matthewkotila @nv-hwoo if you have any thoughts on the variance or improvements to the provided PA arguments
I don't have any concrete ideas on why this would be happening.
@LinGeLin have you tried re-running the entire experiment multiple times to confirm that it consistently shows degraded performance for concurrencies 3 and 7? Perhaps you'll want to decrease the stability percentage (-s
)? And/or increase the measurement window (--measurement-interval
)?
Description I used the latest image version 24.06 because the corresponding latest version of trt has support for BF16. But when I deploy the model with trt-backend. I used perf_analyze to pressure test the model service and got a fluctuating result.
Triton Information 2.47.0
Are you using the Triton container or did you build it yourself?
image version 24.06
To Reproduce perf_analyze
My pressure test results:
You can see that the throughput drops significantly when the concurrency is 3 or 7. This seems very strange. Does anyone know a possible cause.
Some Settings in config.pbtxt:
Expected behavior Is there a statistical problem with the time taken? Or is there a configuration problem? Hope to see a more stable outcome