triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.99k stars 1.44k forks source link

How does the stateful model maintain state among multiple pods? #7360

Open whzghb opened 2 months ago

whzghb commented 2 months ago

When tritonserver is deployed as multiple pods, for stateful models, does traffic have to be sent to the pod of the last inference request? If so, how is it done?

statiraju commented 1 month ago

@Tabrizian can you help answer here.

Tabrizian commented 1 month ago

@whzghb Yes, when deploying stateful models you need to make sure to send all the request with the same correlation ID to the same Triton instance.

whzghb commented 1 month ago

@whzghb Yes, when deploying stateful models you need to make sure to send all the request with the same correlation ID to the same Triton instance.

@Tabrizian That means it depends on the client, not the server, right?