triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.25k stars 1.47k forks source link

Assertion `batchSize > 0' failed, when deploy the tf-trt int8 optimization model #4559

Closed helenHlz closed 2 years ago

helenHlz commented 2 years ago

Description I use tf serving to deploy a tf-trt int8 optimization model on a t4 nvidia card. Then I got this bug "Assertion `batchSize > 0' failed".
This is the log:

When I use the same tf-trt int8 optimization model on offline prediction, it works fine.

The strange thing is the deploying of tf-trt FP16 optimization model works fine.
Here is the log:

Triton Information What version of Triton are you using? Tensorflow 1.15.0 TensorRT 5.1.5

To Reproduce I can upload some code if needed

Expected behavior I'm just confused why this problem occurs when deploying, since the int8 model offline inference could work and the fp16 model deploying have no problem. I found someone else had this problem too, but this answer didn't help me solve the problem. https://github.com/triton-inference-server/server/issues/550

rmccorm4 commented 2 years ago

Hi @helenHlz,

If this is a TFServing issue, please reach out to the TFServing folks, looks like you already have here: https://github.com/tensorflow/serving/issues/2021

If you can reproduce a similar issue serving your model with Triton, then please open a new issue here.