poke machine integers with opposite host endianness via byte-swapping

raehik commented 1 year ago

See #13 for the original discussion for this change.

raehik commented 1 year ago

I added a semi-coherent explanation in the commit. I'm fairly certain this is a consistent improvement across any GHC target. Interesting case where doing less work results in faster code.

raehik commented 1 year ago

Running cabal bench --benchmark-options=bWord32/4 on master (x86_64, GHC 9.4):

time                 25.75 ns   (25.26 ns .. 26.72 ns)
                     0.997 R²   (0.994 R² .. 1.000 R²)
mean                 25.94 ns   (25.77 ns .. 26.28 ns)
std dev              784.5 ps   (446.1 ps .. 1.306 ns)
variance introduced by outliers: 13% (moderately inflated)

And on this branch:

time                 22.90 ns   (22.48 ns .. 23.44 ns)
                     0.997 R²   (0.994 R² .. 1.000 R²)
mean                 22.66 ns   (22.57 ns .. 22.92 ns)
std dev              525.9 ps   (327.0 ps .. 844.1 ps)

This is an 11% improvement, which is surprising. I was doing smaller benchmarks, so maybe it's a little faster than I thought in the large. (Then again, this benchmark is only Word32s.)

raehik commented 1 year ago

@nikita-volkov

nikita-volkov / ptr-poker

poke machine integers with opposite host endianness via byte-swapping #14