google / automl

Google Brain AutoML
Apache License 2.0
6.22k stars 1.45k forks source link

Different Inference time on bm and saved_model_benchmark #1168

Closed wonkyoc closed 2 years ago

wonkyoc commented 2 years ago

tensorflow==2.9.1 cuda==11.7 gpu==GTX1080Ti

Exp 1

python model_inspect.py --runmode=saved_model_benchmark --model_name=efficientdet-d0 --saved_model_dir=/path/to/efficientdet-d0 --input_image=img.png --output_image_dir=/tmp

Per batch inference time:  0.07774268870707601
FPS:  12.862945913381841

Exp 2

python model_inspect.py --runmode=bm --model_name=efficientdet-d0 --saved_model_dir=/path/to/efficientdet-d0
Per batch inference time:  0.011359048599842936
FPS:  88.03554199194352

Is a latency of EfficientDet determined by the complexity of an image vector? Since the first exp uses a real image, I assume that this may take longer than the second one. But still, the gap is quite high. Is this a normal result?

wonkyoc commented 2 years ago

The difference mainly comes from warmup stage. The warmup drastically reduces processing time. https://github.com/google/automl/blob/3bdff765d63113de7e5934868d2a1ef630e2b3d2/efficientdet/model_inspect.py#L397-L400

wonkyoc commented 2 years ago

It is true that --runmode=bm benefits from warmup but there is no benefit on --runmode=saved_model_benchmark. I suspect that using an actual pre-trained model in --runmode=saved_model_benchmark increases latency but not in --runmode=bm

wonkyoc commented 2 years ago

Okay. I missed the section 4 in README.md. The "bm" one only takes network latency and the "saved_model_benchmark" processes end-to-end so it definitely differs from each other.