Closed raehik closed 1 year ago
I added a semi-coherent explanation in the commit. I'm fairly certain this is a consistent improvement across any GHC target. Interesting case where doing less work results in faster code.
Running cabal bench --benchmark-options=bWord32/4
on master
(x86_64, GHC 9.4):
time 25.75 ns (25.26 ns .. 26.72 ns)
0.997 R² (0.994 R² .. 1.000 R²)
mean 25.94 ns (25.77 ns .. 26.28 ns)
std dev 784.5 ps (446.1 ps .. 1.306 ns)
variance introduced by outliers: 13% (moderately inflated)
And on this branch:
time 22.90 ns (22.48 ns .. 23.44 ns)
0.997 R² (0.994 R² .. 1.000 R²)
mean 22.66 ns (22.57 ns .. 22.92 ns)
std dev 525.9 ps (327.0 ps .. 844.1 ps)
This is an 11% improvement, which is surprising. I was doing smaller benchmarks, so maybe it's a little faster than I thought in the large. (Then again, this benchmark is only Word32
s.)
@nikita-volkov
See #13 for the original discussion for this change.