jk-jeon / dragonbox

Reference implementation of Dragonbox in C++
Apache License 2.0
607 stars 39 forks source link

The multipllier $10^{\kappa}$ does not need to be a power of 10 #63

Open jk-jeon opened 5 months ago

jk-jeon commented 5 months ago

The only place where it is a power of 10 is used is that powers of 10 are stored in the precomputed cache. By allowing more general integers other than $10^{\kappa}$, we may further reduce the chance of having to investigate the fractional part, thus boosting performance. It may also opens up a possibility of eliminating the needs for treating the shorter interval case specially.

jk-jeon commented 5 months ago

An alternative would be to use $2^{q-p-2}$ as the multiplier, i.e., $128$ for binary32 and $1024$ for binary64. This has an advantage of simplifying the division and divisibility check by the multiplier, but it reduces the error margin for the approximate multiplication by $10^{k}$ into very small value, which has a serious impact on compressed cache handling.

Currently, clang translates this division + divisibility check into imul + movzx + shr + cmp. If we use $2^{q-p-2}$, then this would become test + sete + shr.

Another alternative would be to consider a mixed power of $2$ and $5$, but it doesn't sound that worth the trouble of rewriting a lot of the paper/implementation.

On the other hand possible elimination of the shorter interval branch is worth investigating, but it seems like a completely independent issue.