running time per image in 'bm' mode with TF-TRT in model_inspect.py is close to that without TF-TRT

mingxingtan commented 4 years ago

I might not use the tensorRT in the right way. @itsliupeng seems to get better results with tensorRT: https://github.com/google/automl/pull/299

fsx950223 commented 4 years ago

I found tensorrt model is slower than normal saved model when I use combined nms.

mingxingtan commented 4 years ago

Looks like tensorrt is slightly faster on my Titan V GPU for full models (both using FP32)

I added instructions here: https://github.com/google/automl/tree/master/efficientdet#3-export-savedmodel-frozen-graph-tensort-models-or-tflite

itsliupeng commented 4 years ago

Sorry. There are some bugs in converting

tf.image.combined_non_max_suppression *to TRT, the results are not identical. Nvidia’s engineer informs me it will be fixed soon.

On Wed, May 27, 2020 at 2:04 AM Mingxing notifications@github.com wrote:

Looks like tensorrt is slightly faster on my Titan V GPU for full models (both using FP32)

I added instructions here: https://github.com/google/automl/tree/master/efficientdet#3-export-savedmodel-frozen-graph-tensort-models-or-tflite

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/automl/issues/331#issuecomment-634186062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFGHC5K2BBTVXWWGAYCNULRTQACRANCNFSM4MTPLDVQ .

mingxingtan commented 4 years ago

It looks like TensorRT indeed speeds up a little bit. Here is the run on a V100:

$ python model_inspect.py --runmode=bm
Per batch inference time:  0.010065060993656515
FPS:  99.3535956344674

$ python model_inspect.py --runmode=bm --tensorrt=FP32
Per batch inference time:  0.007458997704088688
FPS:  134.0662699831433

google / automl

running time per image in 'bm' mode with TF-TRT in model_inspect.py is close to that without TF-TRT #331