triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.4k stars 1.49k forks source link

triton gpu deploy suddenly become very slow from 0.03s to 12s, how to solve it ? #7638

Open yiluzhuimeng opened 2 months ago

yiluzhuimeng commented 2 months ago

Description A clear and concise description of what the bug is.

Triton Information What version of Triton are you using?

Are you using the Triton container or did you build it yourself?

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

oandreeva-nv commented 2 months ago

Hi @yiluzhuimeng , Could you please feel out the question template, this will help us tremendously in assisting you with the issue