This pure Go implementation of Mul32 is more than twice as fast as the
assembly Mul implementation, and four times faster than the pure Go Mul.
Mul32 7.91ns ± 1%
Mul 18.6ns ± 1%
Mul [purego] 33.4ns ± 0%
Before Go 1.13, where we can't use math/bits because the fallbacks might
not be constant time, Mul32 is a little slower, but not nearly as much
as the pure Go Mul.
This pure Go implementation of Mul32 is more than twice as fast as the assembly Mul implementation, and four times faster than the pure Go Mul.
Before Go 1.13, where we can't use math/bits because the fallbacks might not be constant time, Mul32 is a little slower, but not nearly as much as the pure Go Mul.
/cc @hdevalence