Closed Ok23 closed 1 year ago
I'd up the minEpochTime, e.g. like so:
using namespace std::literals;
Bench().minEpochTime(1s).run(...)
Did not help. I measured on what cores it runs, and did not find any relations between cores and results. Every time i execute program results are 27ns or 0.5ns
Maybe the compiler optimizes some of your choice away? Try to modify the input in the loop, e.g add a number each time, and make sure to keep each result, e.g. sum up the result and use doNotOptimizeAway
It seems related to avx2 mul (_mm256_mul_ps) and add (_mm256_add_ps) instructions and not related to compiler optimisations because results absolutely different between running same binary
Just to be safe I'd do something like this, modify input and make sure output is not optimized away, and icnrease minimum epoch time: https://godbolt.org/z/eEb78vP5n
Just to be safe I'd do something like this, modify input and make sure output is not optimized away, and icnrease minimum epoch time: godbolt.org/z/eEb78vP5n
no, it doesn't work, it still jumps beween 5ns and 30ns. I already tried to block optimizer
Most likely due to https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html Google benchmark probably runs for much longer and frequency change gets "hidden" in the accumulation of timings
Can give hugely (about 100 times slower) differ results when run AVX2 code
Sometimes when i rebuild writes about 0.5 ns/op, and when i relaunch writes about 29 ns/op, i think it related to Windows 11 thread manager or/and because my processor is Alder Lake i5-12600k with E-cores.
Google Benchmark seems give more consistent results about 0.23 ns