Query regarding timings under ONNXRT profiler

Darshvino commented 2 years ago

Hi ONNXRT team,

I was running a model with 4 ops(Conv-mul-batchnorm-relu) using ONNXRT and had a few queries on benchmarking:

1.) I was working on getting the timing results for the 4 ops using ONNXRT profiling. If I run the same model multiple times using C++ API, then it seems that the first 2-3 runs take a bit more time than the next runs. What should be the reason for this? Is it something about memory loading overhead for the first 2-3 runs or something else?

2.) And does the ONNXRT profiler uses libraries like google benchmarks for timings?

3.) Also, does the first op always takes more time compared to the next ops in every model inference?

Look forward to the response

Thank you

chenfucn commented 2 years ago

Modern computer is an engineering wonder that has a complex memory hierarchy. Chances are the first couple runs are hit by cache cold load or things of that nature, both code and data must be load from disk to memory, and memory to different level of cache. Or there are maybe other background threads running in your system.

chenfucn commented 2 years ago

A way for profiling: https://onnxruntime.ai/docs/api/python/auto_examples/plot_profiling.html.

If you build the C++ library, the executable onnxruntime_perf_test provide profiling option as well. run onnxruntime_perf_test --help and look for the document for -p option

microsoft / onnxruntime

Query regarding timings under ONNXRT profiler #12150