jk-jeon / dragonbox

Reference implementation of Dragonbox in C++
Apache License 2.0
588 stars 37 forks source link

Do you have an algorithm to deal with ratio and percentage for integers? #46

Closed trcrsired closed 1 year ago

trcrsired commented 1 year ago

I find situations like percentages very common, but using floating pointers to print them out would be very costly.

Do you have an algorithm to deal with that?

One without precision:

For example, giving input (3,10) Output: 30% or 0.3 For example, giving input (3,9) Output: 33.(3)% or 0.(3) For example, giving input (2,14) Output: 14.(285714)% or 0.(142857)

Also, with precision version (which is very common in statistics, like keeping two digits decimals):

For example, giving input (2,14), precision 2 Output: 14.26% or 0.14

trcrsired commented 1 year ago

Well. I just realized precision is only possible with a few possibilities.

Maybe a roundtrip here is needed too?

jk-jeon commented 1 year ago

For example, giving input (3,10) Output: 30% or 0.3 For example, giving input (3,9) Output: 33.(3)% or 0.(3) For example, giving input (2,14) Output: 14.(285714)% or 0.(142857)

If you are looking for finding the exact decimal representation in terms of the non-repeating and the repeating digits, it's nothing but Euler's theorem. Basically, what it says is that given a pair of coprime integers $a$ and $n$, there is an exponent $e$ such that $n$ divides $a^{e}-1$, and it also gives a formula for computing this exponent in terms of the so-called Euler totient function $\varphi(n)$.

Here is the explanation on why that's relevant here. Let's say we want to compute $p/q$ for a pair of coprime integers $p$ and $q$. We want to find all the digits appearing in evaluating $p/q$ in decimal. If $q$ has $2$ or $5$ as its prime factors, we pull them out and rewrite it as

$$ \frac{p}{q} = \frac{p}{q'}\cdot \frac{1}{2^{a}5^{b}}. $$

Clearly $1/(2^{a}5^{b})$ can be rewritten as $c/10^{k}$ for some integer $c$ and $k:=\max(a,b)$. Now, since $q'$ is coprime to $10$, Euler's theorem tells us that there exists the smallest exponent $e$ such that $q'$ divides $10^{e}-1$. In other words,

$$ m := \frac{10^{e}-1}{q'} $$

is an integer and we have

$$ \frac{p}{q} = \frac{1}{10^{k}}\cdot \frac{mpc}{10^{e}-1}. $$

Next, the factor $\frac{mpc}{10^{e}-1}$ can be divided into the integer part and the fractional part, i.e., we can write

$$ \frac{p}{q} = \frac{1}{10^{k}}\left(g + \frac{h}{10^{e}-1}\right), $$

where $g$ is an integer and $h$ is an integer in the range $0 \leq h < 10^{e}-1$. What does it mean is that, let's say the decimal expansion of $h/(10^{e}-1)$ is like

$$ 0.x{1}x{2}x{3}x{4}x{5}x{6}\ \cdots\ , $$

then if you multiply $10^{e}$ to it, that is, to move the decimal point to right by $e$ digits,

$$ x{1}x{2}\ \cdots\ x{e}.x{e+1}x{e+2}x{e+3}\ \cdots\ , $$

and then subtract $h/(10^{e}-1)$ from it,

$$\begin{align} && x{1}x{2}\ \cdots\ x{e}&.x{e+1}x{e+2}x{e+3}\ \cdots\ \ -&& 0& .x{1}x{2}x_{3}\ \cdots\ \hline \end{align}$$

then the result must be an integer because

$$ 10^{e}\cdot \frac{h}{10^{e}-1} - \frac{h}{10^{e}-1} = h, $$

which means all the digits in the fractional parts cancel each other. Hence, you should have $x{1}=x{e+1}$, $x{2}=x{e+2}$, $x{3}=x{e+3}$, and so on, i.e., $x{1}\ \cdots\ x{e}$ is the repeated digits you are looking for, and this is is precisely the value of the integer $h$. In summary,

Well, it doesn't sound simple, right? Especially, I don't think there is a blazingly fast method for finding the smallest exponent $e$. The "obvious" method based on Euler's theorem involves two prime factorizations, one for $q$ to compute the Euler totient function, and another for the totient function $\varphi(q')$ itself in order to figure out the smallest exponent. (The smallest exponent $e$ must be a divisor of $\varphi(q')$, but that's all we know about $e$ as far as I know.) And I can imagine that typically the exponent $e$ will be way larger than $19$ so that $10^{e}$ will be much larger than what a machine word can hold.

My conclusion is this: I haven't thought about this subject more than I wrote here and also don't know about any existing literature on it, but it sounds like a decent research question. It sounds like at least as hard as the arbitrary-precision formatting of floating-point numbers, if not harder.

jk-jeon commented 1 year ago

Also, with precision version (which is very common in statistics, like keeping two digits decimals):

For example, giving input (2,14), precision 2 Output: 14.26% or 0.14

Again, if the divisor is not an invariant constant, I do not think there are lots of things we can do with that. Floating-point case was easier because the divisor is indeed a known constant, powers of $2$ or $10$.

I'm closing this as it sounds like an out of scope issue, but feel free to reopen if you think it's relevant.

trcrsired commented 1 year ago

Also, with precision version (which is very common in statistics, like keeping two digits decimals): For example, giving input (2,14), precision 2 Output: 14.26% or 0.14

Again, if the divisor is not an invariant constant, I do not think there are lots of things we can do with that. Floating-point case was easier because the divisor is indeed a known constant, powers of 2 or 10.

I'm closing this as it sounds like an out of scope issue, but feel free to reopen if you think it's relevant.

what about just 2 decimal places for percentages for example? It is very common in academic research.

jk-jeon commented 1 year ago

what about just 2 decimal places for percentages for example? It is very common in academic research.

Just multiply 100 to the numerator and divide by the denominator then.

EDIT: Well, actually, if you want the correct round-to-nearest, tie-to-even behavior, then you may need to do a little bit more, like multiply 200 and add 1 and things like that. But basically that's that.