triton gpu deploy suddenly become very slow from 0.03s to 12s, how to solve it ?

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

BSD 3-Clause "New" or "Revised" License

8.4k stars 1.49k forks source link

Description A clear and concise description of what the bug is.

Triton Information What version of Triton are you using?

Are you using the Triton container or did you build it yourself?

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

triton-inference-server / server

triton gpu deploy suddenly become very slow from 0.03s to 12s, how to solve it ? #7638