triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.27k stars 1.47k forks source link

TRTIS should support variable-sized input and output tensor dimensions #8

Closed deadeyegoodwin closed 5 years ago

deadeyegoodwin commented 5 years ago

Currently TRTIS only allows the first dimension of an input/output tensor to be variable sized and only when that dimension represents batching. TRTIS should allow variable-sized dimensions in other cases since these are supported by some of the FWs (e.g. TensorFlow) and not having it limits which models can easily run on TRTIS.

dcyoung commented 5 years ago

The described limitation is stalling our team's migration from tf-serving to tensorrt-inference-server. Glad to hear the need is understood and we will be following the progress eagerly!

xinli94 commented 5 years ago

Thanks for taking this into consideration! This blocks me from switching to tensorrt inference server for quite a while, also makes deploying detection models such a pain. Hope it get supported soon!

tilaba commented 5 years ago

hi, can TRTIS support the model with mutilple outputs??

bezero commented 5 years ago

Hi @tilaba. Yes it does support multiple outputs. For example in case of tensorflow object detection api, for your outputs in the config file you set as follows: output [ { name: "detection_boxes" data_type: TYPE_FP32 dims: [ 100, 4 ] }, { name: "detection_scores" data_type: TYPE_FP32 dims: [ 100 ]
}, { name: "detection_classes" data_type: TYPE_FP32 dims: [ 100 ]
} ] It is an array of outputs. Later, when sending request you can use: results = ctx.run( {input_name: input_name}, {output: InferContext.ResultFormat.RAW for output in output_names}, batch_size) Where output_names = ["detection_boxes", "detection_scores", "detection_classes"]

tilaba commented 5 years ago

it works, thanks @bezero

tilaba commented 5 years ago

Have this issue been fixed?

blackarrow3542 commented 5 years ago

This will be really useful for CTPN CRNN models for OCR.

deadeyegoodwin commented 5 years ago

The inference server now supports variable-size input and output tensor dimensions for backends that support them. As of now that is Tensorflow, Caffe2, and custom (assuming your custom backend handles them correctly). You specify such a dimension by using -1 in the model configuration for the appropriate dimension.

This support is on the master branch and will be in the 19.02 release. Please give it a try and report any issues.

dcyoung commented 5 years ago

@deadeyegoodwin With this feature, our team is excited to explore a migration from tf-serving to TRTIS. Thank you for responding to the community feedback. It is much appreciated.

bezero commented 5 years ago

@deadeyegoodwin When TRTIS container for 19.02 release will be available?

deadeyegoodwin commented 5 years ago

The monthly container releases are typically available around the 25th. So, following typical practice, 19.02 would be available around Monday 2/25. But this month I think it may be delayed till the end of that week.

ziyuang commented 5 years ago

The inference server now supports variable-size input and output tensor dimensions for backends that support them. As of now that is Tensorflow, Caffe2, and custom (assuming your custom backend handles them correctly). You specify such a dimension by using -1 in the model configuration for the appropriate dimension.

This support is on the master branch and will be in the 19.02 release. Please give it a try and report any issues.

How come TRTIS supports dynamic input size while TensorRT itself doesn't?

bezero commented 5 years ago

@ziyuang TRTIS does not support TensorRT models alone. It also supports other frameworks as well (tensorrt_plan, tensorflow_graphdef, tensorflow_savedmodel, caffe2_netdef, or custom). These platforms do support dynamic input size. To sum up, TRTIS allows you to specify dynamic input size for your models that are able to handle such inputs.

ziyuang commented 5 years ago

@ziyuang TRTIS does not support TensorRT models alone. It also supports other frameworks as well (tensorrt_plan, tensorflow_graphdef, tensorflow_savedmodel, caffe2_netdef, or custom). These platforms do support dynamic input size. To sum up, TRTIS allows you to specify dynamic input size for your models that are able to handle such inputs.

Good; would I have the computation graph optimized if I use models other than TensorRT PLAN?

deadeyegoodwin commented 5 years ago

Each backend/framework (tensorrt, tensorflow, caffe2) has its own optimization techniques that it applies to the model before execution. Typically, the optimizations performed by tensorrt provide significant speedups relative to other frameworks. But tensorflow does have some optimization as well. There is also the TensorRT-Tensorflow integration that allows you to get many of the benefits of tensorrt while still using tensorflow. TRTIS fully supports tensorflow models that have been optimized with tensorrt. https://github.com/tensorflow/tensorrt