Open JAkutenshi opened 6 years ago
Also, seems the same with #185 but more informative with solution approach
The paper that you refer to is kinda old. Currently the Intel manual prefers lfence before the first rdtsc, as opposed to cpuid, and rdtscp if that is supported. Personally I prefer lfence; rdtsc anyway and use rdtscp; cpuid to measure the closing timestamp (as in that paper). But all of that seems rather futile unless you do your measurement in a kernel module on a cpu that isn't used (turned off during boot) and with a tweaked bios; or else you get so much noise from all kinds of things that you might as well not bother with the serializing.
@CarloWood
... else you get so much noise from all kinds of things that you might as well not bother with the serializing.
Would booting the OS in 'safe mode (minimal)' help in mitigating that problem in your opinion?
Hi,
I checked out some benchmarks and learned about rdtsc instruction. The problem is multiple CPUs in modern architectures and unsyncronized cyclecounters in different CPUs + instruction reordering problem. It cause of inaccuracy in time counting results. More and better the problem and solution approach are wrote in the Intel's whitepaper about benchmarking, Section 3: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
I'm not sure, that I personally will improve it with cpuid and rdtscp instructions, and that I'll create pull-request with it: i'm solving another problem now. But may be you can improve your cyclecounter approach based on witepaper above for amd64 (x86_64 the same)? As i can see, another frameworks are equal or worst than Google's and this counting quality improvement may be really important.