triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.49k forks source link

Improve Error Reporting for load_model in Triton's Explicit Mode #6711

Open teith opened 11 months ago

teith commented 11 months ago

Is your feature request related to a problem? Please describe. Currently, when Triton Inference Server is running in --model-control-mode=explicit and a load_model request is sent from the client for a model with an error, the response results in a TimeoutError: timed out. This limited feedback provides insufficient information about the error, making it more difficult to debug and resolve issues with the model.

Describe the solution you'd like I propose enhancing the Triton server's response mechanism in explicit mode. When an error occurs during the load_model process, instead of timeout, the server should return a detailed response containing Triton's log about the error. This would greatly aid in identifying and resolving issues with the model quickly and efficiently.

Describe alternatives you've considered The only alternative now is to manually check Triton's logs, which is less convenient and time-consuming.

Additional context This feature would make it easier and quicker to find and fix issues with models in the Triton Inference Server, leading to a smoother and more user-friendly model management experience.

D1-3105 commented 11 months ago

+1

denti commented 11 months ago

+1

oandreeva-nv commented 11 months ago

Thanks for reporting this issue! I've created a ticket for the team [Bug: 5945]