Closed chuikova-e closed 1 month ago
You can add another output with the same name as the output state if you want to return it to the client. https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#implicit-state-management
For debugging purposes, the client can request the output state. In order to allow the client to request the output state, the output section of the model configuration must list the output state as one of the model outputs. Note that requesting the output state from the client can increase the request latency because of the additional tensors that have to be transferred.
Is it the only way to extract states?
Currently, yes that's the only way. Let us know if you have ideas for other ways to extract states as well.
Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.
Description I am using the Triton Inference Server with a TensorRT backend, Sequence Batching, Old Batching Strategy and Implicit State Management. I would like to find the most efficient method to update my inference model without causing downtime.
One approach involves requesting model states from the terminating old Triton instance (which is serving the old model) at the moment of model replacement. These states are then sent to other still operational Triton instances to handle client requests until the transition is complete. Implementing this approach requires accessing the current states stored in Triton.
Question Is it possible to extract model states stored in Triton via its API?