Open jxhekang opened 4 years ago
After some tests, I got the answer why tf-trt-model unstable is that: It's because the input image used by me is one synthetic image(im_list = [128 * np.ones([576, 1024, 3]).astype(np.float32)]). In this image, all pixel's data is constant 128. The tf-trt-model can't get valid information to pass to some kind of trt-op, and generate some error tensor(batchsize=0) by mistake.
I think this is a bug needed to be fixed in tf-trt, because it's hard to guarantee that the input images always have valid object or information. The 'Batch size was:0' error will result in program abort,this will be one hidden danger in tf-trt....
@pooyadavoodi I notice your reply about 'Batch size was:0' error in the link below: https://github.com/tensorflow/tensorflow/issues/33184#issuecomment-567605513 But I still can't get enough information from the tf-trt's log to handle my 'Batch size was:0' error. In the tf-trt's log, I know the error happend in TRTEngineOp_0. However, in tf2.1.0, there is no black-node-list option. The only way I can do is to set --minimum_segment_size 40, and it works,the 'Batch size was:0' error didn't happend any more, but it also may lead to tf-trt‘s inefficient. Hope your team can handle this error in the next tf-trt version.
@bixia1
I try to generate tf-trt model by using tf2.1.0's TrtGraphConverterV2(xxxx) interface. In tf2.1.0's TrtGraphConverterV2, the is_dynamic_op can only be Ture, which means the tf-trt model can handle input images of different size dynamicly. At first, I got one tf-trt model(model_A) successfully, and it seems work well and fast. However, when I changed a few parameters in my net and re-generate the tf-trt model(model_B) , the new tf-trt model became unstable. For example: it can do inference when I feed images(batch=1,H=1000,W=600, C=3) to the tf-trt model, but when I feed other images(such as:batch=1,H=1024,W=600, C=3;or batch=1,H=512,W=512, C=3), I got the error like below.....
I tried to convert my model_B in nvidia's docker(nvcr.io/nvidia/tensorflow:20.02-tf2-py3), but still got almost same error. The error notice that Batch size was: 0, but engine max batch size was: 1, I really don't know where does this batch-0 come from.... Is there any one meet the error like this?