triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.49k forks source link

Dynamic State Initialization in the Runtime #6579

Open littleMatch03 opened 1 year ago

littleMatch03 commented 1 year ago

I've converted a model to tensorRT format. And at the inference time, I need to initialize the state of the model with a custom value. I implemented this with pycuda in python inference. but it was not possible with triton server. it only provided static state initialization like zero values or a static values.

So what I want is to reset the state of a stateful model with a custom value in the runtime Thanks

krishung5 commented 1 year ago

Hi @littleMatch02, I think you can use the ResetSequence boolean input tensor to reset the values of state tensors to the initial values. I don't think we support custom value at the moment. @Tabrizian Please feel free to correct me if I'm wrong. I can create a feature requset ticket to track this.