Open statementreply opened 4 years ago
What does the perf look like now that we've merged #4740?
I overhauled the benchmark (dropping unnecessary code and especially the non-deterministic seeding) and got fresh numbers now that we've merged #4740.
I used VS 2022 17.12 Preview 2 on my 5950X. Table:
Benchmark | Time |
---|---|
BM_Generator<std::mt19937_64> |
4.08 ns |
BM_Generator<b_r::mt19937_64> |
2.65 ns |
BM_Distribution<std::mt19937_64, std::normal_distribution<double>> |
13.1 ns |
BM_Distribution<std::mt19937_64, b_r::normal_distribution<double>> |
9.19 ns |
BM_Distribution<std::mt19937_64, std::uniform_real_distribution<double>> |
5.81 ns |
BM_Distribution<std::mt19937_64, b_r::uniform_real_distribution<double>> |
9.53 ns |
BM_Distribution<b_r::mt19937_64, std::normal_distribution<double>> |
12.6 ns |
BM_Distribution<b_r::mt19937_64, b_r::normal_distribution<double>> |
8.41 ns |
BM_Distribution<b_r::mt19937_64, std::uniform_real_distribution<double>> |
5.23 ns |
BM_Distribution<b_r::mt19937_64, b_r::uniform_real_distribution<double>> |
8.79 ns |
With std::mt19937_64
as the generator, Boost's normal_distribution
is only 1.43x faster than ours.
And now our uniform_real_distribution
is 1.64x faster than Boost's, so the new generate_canonical
is indeed awesome.
I conclude that our underlying algorithm for normal_distribution
is still suboptimal, but the generate_canonical
improvement has substantially narrowed the overall perf gap. If we improved normal_distribution
, we would likely outperform Boost, as uniform_real_distribution
already does.
Describe the bug
A benchmark by Alexander Neumann (original issue reporter) showed that
std::normal_distribution
withstd::mt19937_64
was 4 times slower thanboost::normal_distribution
withstd::mt19937_64
.Additional context
Part of the performance deficiency is due to #1000.
Also tracked by DevCom-86909 and Microsoft-internal VSO-486661 / AB#486661.