triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.14k stars 1.46k forks source link

How to extract model states stored in Triton (Implicit State Management) #7119

Closed chuikova-e closed 1 month ago

chuikova-e commented 5 months ago

Description I am using the Triton Inference Server with a TensorRT backend, Sequence Batching, Old Batching Strategy and Implicit State Management. I would like to find the most efficient method to update my inference model without causing downtime.

One approach involves requesting model states from the terminating old Triton instance (which is serving the old model) at the moment of model replacement. These states are then sent to other still operational Triton instances to handle client requests until the transition is complete. Implementing this approach requires accessing the current states stored in Triton.

Question Is it possible to extract model states stored in Triton via its API?

Tabrizian commented 5 months ago

You can add another output with the same name as the output state if you want to return it to the client. https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#implicit-state-management

For debugging purposes, the client can request the output state. In order to allow the client to request the output state, the output section of the model configuration must list the output state as one of the model outputs. Note that requesting the output state from the client can increase the request latency because of the additional tensors that have to be transferred.

chuikova-e commented 5 months ago

Is it the only way to extract states?

Tabrizian commented 5 months ago

Currently, yes that's the only way. Let us know if you have ideas for other ways to extract states as well.

krishung5 commented 1 month ago

Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.