Assertion `batchSize > 0' failed, when deploy the tf-trt int8 optimization model

Description I use tf serving to deploy a tf-trt int8 optimization model on a t4 nvidia card. Then I got this bug "Assertion `batchSize > 0' failed".
This is the log:

2022-06-29 09:41:27.542443: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 6986645 microseconds.
2022-06-29 09:41:27.587142: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:117] Starting to read warmup data for model at /workspace/repository/savedmodel/group_7436/bert-int8-test/23/assets.extra/tf_serving_warmup_requests with model-warmup-options
2022-06-29 09:41:49.342088: I external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_0 input shapes: [[128]]
2022-06-29 09:41:49.665296: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output. tensorflow_model_server: regionFormat.cpp:56: size_t nvinfer1::RegionFormatB::memorySize(int, const nvinfer1::Dims&) const: Assertion `batchSize > 0' failed.

When I use the same tf-trt int8 optimization model on offline prediction, it works fine.

2022-06-29T09:11:29.852812877Z 2022-06-29 09:11:29.852690: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_0 input shapes: [[640]]
2022-06-29T09:11:29.852866360Z 2022-06-29 09:11:29.852799: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.5
2022-06-29T09:11:29.856544909Z 2022-06-29 09:11:29.856491: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.5
2022-06-29T09:11:31.201361074Z 2022-06-29 09:11:31.201237: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_1 input shapes: [[5,128,768], [5,128,768]] ...

The strange thing is the deploying of tf-trt FP16 optimization model works fine.
Here is the log:

2022-06-29 10:08:27.949798: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:117] Starting to read warmup data for model at /workspace/repository/savedmodel/group_7436/bert-int8-test/25/assets.extra/tf_serving_warmup_requests with model-warmup-options
2022-06-29 10:08:49.022089: I external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_0 input shapes: [[128]]
2022-06-29 10:08:49.331724: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output.
2022-06-29 10:08:50.570585: I external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_1 input shapes: [[1,128,768], [1,128,768], [1,128,768]]

Triton Information What version of Triton are you using? Tensorflow 1.15.0 TensorRT 5.1.5

To Reproduce I can upload some code if needed

Expected behavior I'm just confused why this problem occurs when deploying, since the int8 model offline inference could work and the fp16 model deploying have no problem. I found someone else had this problem too, but this answer didn't help me solve the problem. https://github.com/triton-inference-server/server/issues/550

triton-inference-server / server

Assertion `batchSize > 0' failed, when deploy the tf-trt int8 optimization model #4559