About Timing Calculation

Hi;

There is a glitch with the timing code it sets

torch.cuda.backedns.benchmark=True which runs an internal optimizer in the gpu to find the best inference method for the given input shape however as the input shape slightly changes from image to image it continuously optimizes for the unseen shapes which dominates the duration of the main model inference time.

This is also mentioned in https://discuss.pytorch.org/t/model-inference-very-slow-when-batch-size-changes-for-the-first-time/44911

and I have seen it in nvidia visual profiler as well

If you try it with torch.cuda.backedns.benchmark=False code runs faster or

if you would like to keep torch.cuda.backedns.benchmark=True input should be cropped to get a better feeling about how fast the model runs because for the same size the optimizer runs once and sets the input size for the inference internally, or sweep the dataset twice so every single image size is seen and use the second sweep for the timing results

Thanks

ofsoundof / IMDN

About Timing Calculation #2