triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.12k stars 1.46k forks source link

ModelStreamInferResponse Proto should use serialized grpc error with status code in ErrorMessage field #6110

Open dhaval24 opened 1 year ago

dhaval24 commented 1 year ago

Description Currently triton server doesn't capture the full serialized grpc error message in the message field. proto: https://github.com/triton-inference-server/common/blob/1df32b982a6ed11ead3271a55b04bf6e7abc1cf9/protobuf/grpc_service.proto#L831

The error messages should include the grpc error code, so that they can be reconstructed in upstream applications for correct error handling.

Errors such as below needs to have the error code correctly available to upstream consumers. https://github.com/triton-inference-server/server/blob/b0fb26a0f480950f214ffa1ae1847ca8c5930235/src/core/scheduler_utils.cc#L105-L107

Additionally: the abvoe error message should not be UNAVAILABLE but rather be RESOURCE_EXHAUSTED.

Triton Information 23.04

Are you using the Triton container or did you build it yourself? Triton container

To Reproduce

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior Expected behaviror should be fully serialized grpc error message with error code that can be reconstructed into grpc error correctly.

jbkyang-nvi commented 1 year ago

Thanks for your feedback. Created an enhancement ticket request

jbkyang-nvi commented 1 year ago

Also, was the intent that the server return a GRPC specific message? Can you show an example? This might be difficult because Triton supports both HTTP and GRPC.

dhaval24 commented 1 year ago

Apologies for a delayed response. The expectation was that - the response code here for graph should be RESOURCE_EXHAUSTED. https://grpc.github.io/grpc/core/md_doc_statuscodes.html and that should translate to 429 for http response code.

dhaval24 commented 1 year ago

The error correct error code + message should be sufficient. Unavailable is 503 which is not particularly correct as server endpoint is available.