Benchmarking on x86_64 instances using perf seems less accurate than using rdtsc. This PR (a) adds support for PMU_CYCLES on x86_64 systems, and (b) changes the CI to use rdtsc-based cycle counting for all x86_64-based benchmarks. See the commit messages for a bit more detail.
Benchmarking on x86_64 instances using
perf
seems less accurate than usingrdtsc
. This PR (a) adds support forPMU_CYCLES
on x86_64 systems, and (b) changes the CI to userdtsc
-based cycle counting for all x86_64-based benchmarks. See the commit messages for a bit more detail.