The multipllier $10^{\kappa}$ does not need to be a power of 10

An alternative would be to use $2^{q-p-2}$ as the multiplier, i.e., $128$ for binary32 and $1024$ for binary64. This has an advantage of simplifying the division and divisibility check by the multiplier, but it reduces the error margin for the approximate multiplication by $10^{k}$ into very small value, which has a serious impact on compressed cache handling.

Currently, clang translates this division + divisibility check into imul + movzx + shr + cmp. If we use $2^{q-p-2}$, then this would become test + sete + shr.

Another alternative would be to consider a mixed power of $2$ and $5$, but it doesn't sound that worth the trouble of rewriting a lot of the paper/implementation.

On the other hand possible elimination of the shorter interval branch is worth investigating, but it seems like a completely independent issue.

jk-jeon / dragonbox

The multipllier $10^{\kappa}$ does not need to be a power of 10 #63