Closed gsauthof closed 3 years ago
Well spotted! It's on purpose. For N-bit division, libdivide only tries N and N+1 bit magic numbers, skipping N-1, N-2, etc. This makes the magic number computation faster (no loops) and doesn't affect division performance: an N-bit multiplication always requires the same number of clock cycles, no matter how large the value is.
gcc generates the smallest magic number by looping over possible shifts. The only real benefit here is code size: if you generate a smaller literal it may take up less space in the instruction stream. This makes sense for an offline compiler like gcc, not for libdivide.
Closing this, computing a smaller magic number has no real benefit for libdivide.
Probably doesn't make much of a difference for most uses - and I don't know if it's a libdivide goal to minimize returned magic/shift values.
I noticed that sometimes libdivide returns greater values than what gcc comes up with (when using a static divisor).
Example: