triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.4k stars 1.49k forks source link

Ability to do casting between datatypes within backend #7680

Open kronoker opened 1 month ago

kronoker commented 1 month ago

Is your feature request related to a problem? Please describe. I used the model on OnnxRuntime backend, that accepted all requests from clients directly as first step of the ensemble. The model had input with int16 datatype TYPE_INT16.

Clients have to sent arrays of int16 elements to the server.

Then I changed backend of this model from OnnxRuntime to TensorRT. But TensorRT doesn't support int16 datatype itself. And I added auxiliary model on Python backend as first step of the ensemble, that casts int16 elements from client to int32 for TensorRT model.

Describe the solution you'd like Support data type casting of inputs within any backend (or within TensorRT backend at least), specified by the config.pbtxt

Describe alternatives you've considered I haven't come up with any alternatives.