Closed jeisinge closed 3 years ago
Save issue for me, anyone can provide a workaround?
More recent versions of Triton can support variable length tensors. Please see https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#inputs-and-outputs for more information.
Is your feature request related to a problem? Please describe. Our inference requests are a bit different than the traditional image inference requests. In particular,
This leads to a couple of complexities.
Describe the solution you'd like TF-Serving provides this type of support in the form of a row-based input. This comes in two forms:
Example
ExampleListWithContext
For Estimator SavedModels, TensorFlow allows for exporting with a parsing serving receiver that takes as input an Example. This allows for requests to have row-based data. Because it is row-based, the length of the rows can be variable. The result is that text fields that have different sizes can efficiently use this format for bandwidth and parsing.
Further, TensorFlow Serving allows for providing common features via ExampleListWithContext. This separates out tensor into
context
anditems[]
. On the server, the context is broadcast/repeated to each item and then inferred.Describe alternatives you've considered For the variable-length text tensors, since the tensor input is fixed rectangular, we would have to choose the largest text size for this tensor and pad to this size. This would increase the memory and bandwidth complexities for the client and server. And, we would have to add additional pre-processing code to process this padded input.
Additional context If there are alternative solutions, please enlighten us!