We've been a bit lazy on how we're using RDTSC. The original piece of code (probably about 10 years ago) had this comment:
Intel actually recommends calling CPUID to serialize the execution flow
and reduce variance in measurement due to out-of-order execution.
We don't do that here yet.
see §3.2.1 http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html
That link is gone, but the paper can be found in mirrors. It's a good resource and has the following advice. We should probably just follow it:
The test programs use the serializing instruction CPUID before and after reading the time stamp counter in order to prevent out-of-order execution to interfere with the measurements.
We've been a bit lazy on how we're using
RDTSC
. The original piece of code (probably about 10 years ago) had this comment:That link is gone, but the paper can be found in mirrors. It's a good resource and has the following advice. We should probably just follow it:
Resources: