jk-jeon / dragonbox

Reference implementation of Dragonbox in C++
Apache License 2.0
588 stars 37 forks source link

Consider adding a section on README elaborating on what roundtrip precisely means #25

Closed rpopescu closed 2 years ago

rpopescu commented 2 years ago

Hi @jk-jeon, and thank you for your amazing work on this! This isn't exactly an issue, but rather a question or a request for clarification of the documentation. The documentation says the algorithm is round-trip. To check this my understand of this, I have written a simple loop whose index "i" iterates all uint64_t values and:

Here's a sample of its output; formatting is done with an old version of fmtlib (5.3.0):

i=4716101740380815687 orig=1.98801e+07 dec=(1988012400000122, -8, +) back=1.98801e+07 back_i=4716101740380815688
i=4716101740380815688 orig=1.98801e+07 dec=(19880124000001222, -9, +) back=1.98801e+07 back_i=4716101740380815689
i=4716101740380815699 orig=1.98801e+07 dec=(19880124000001263, -9, +) back=1.98801e+07 back_i=4716101740380815700
i=4716101740380815700 orig=1.98801e+07 dec=(19880124000001267, -9, +) back=1.98801e+07 back_i=4716101740380815701
i=4716101740380815703 orig=1.98801e+07 dec=(19880124000001278, -9, +) back=1.98801e+07 back_i=4716101740380815704
i=4716101740380815713 orig=1.98801e+07 dec=(19880124000001315, -9, +) back=1.98801e+07 back_i=4716101740380815714

I haven't verified this programatically but I notice that the back integer representation is one off the original integer from which the double was created. Note that the exact code computing "back" from "dec" is:

double back = static_cast<double>(dec.significand) * pow(10.0, static_cast<double>(decimal.exponent));

My questions are:

Thank you kindly in advance for your time.

Alcaro commented 2 years ago

Converting double to shortest accurate string is a difficult task.

Converting string to double is just as difficult. 19880124000001263 is not exactly representable as double, and neither is 10^-9; 19880124000001263.0 * 1.0e-9 rounds to 19880124.000001267, not 19880124.000001263.

You'll get it a bit more accurate if you divide by 10^9 rather than multiply by 10^-9, but significand accuracy will still be a problem for some inputs.

If you're on Linux, glibc strtod/sscanf is perfectly accurate for every input; if on Windows, try MSVC std::from_chars.

rpopescu commented 2 years ago

Thanks for your reply @Alcaro . Note that my question doesn't involve strings; instead it's about going from the decimal struct returned by jkj::dragonbox::to_decimal to a double. I'll try the division. (later edit): I have, computing an additional double called back2 and printing it out (and its integer bit_cast value) if neither back nor back2 equal orig. The code is double back2 = static_cast<double>(dec.significand) / pow(10.0, static_cast<double>(-dec.exponent)); It looks like this:

i=4716101740380817013 orig=1.98801e+07 dec=(19880124000006158, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817014 back2_i=4716101740380817014
i=4716101740380817028 orig=1.98801e+07 dec=(19880124000006214, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817029 back2_i=4716101740380817029
i=4716101740380817056 orig=1.98801e+07 dec=(19880124000006318, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817057 back2_i=4716101740380817057
i=4716101740380817058 orig=1.98801e+07 dec=(19880124000006326, -9, +) back=1.98801e+07 back2=1.98801e+07 back_i=4716101740380817059 back2_i=4716101740380817059
Alcaro commented 2 years ago

Your question doesn't involve character strings, but it does involve converting digits+exponent to double, which is the tricky part of string conversion. This correction doesn't affect the rest of my answer.

For those values, your issue is significand accuracy. 19880124000006158 is not representable as double; if you try, you get 19880124000006160.0, and 19880124000006160.0 / 1e+9 is indeed 19880124.00000616 and not 19880124.000006158.

rpopescu commented 2 years ago

Unfortunately, the part concerning division as an improvement of accuracy does not seem to be accurate.

While I get the point about accuracy, the question of achieving roundtrip hasn't been answered. Roundtrip is mentioned in the documentation, and there are policies regarding rounding for in either direction, yet I have been unable to understand how to convert back to a double in a way that is roundtrip. That's the gist of the question here.

Alcaro commented 2 years ago

I may have been unclear about division; it will give better results for negative exponents, but worse results for positive exponents. For best results, you'll need a branch.

And you'll need a third branch, containing a slower algorithm, if the exponent is outside the range ±23, and/or if the significand is >= 2**53, such that one or both of the mul/div inputs is not exactly representable as double. Prepare for several hundred or thousand lines of numerically tricky code.

Alternatively, you can use someone else's digits to double function. If you're on Linux, I recommend glibc strtod; if on Windows, try MSVC std::from_chars. If you're on something else, David Gay's dtoa.c is quite popular.

Floating point math is complicated.

rpopescu commented 2 years ago

Looking at the strtod source code, I see that it uses a multi-precision number for the integer part, and multiplies that with the relevant power of ten; below the relevant bits of code and comments:

 /* Read the integer part as a multi-precision number to NUM.  */
startp = str_to_mpn (startp, int_no, num, &numsize, &exponent ...
// ...
/* We now multiply the gained number by the given power of ten.  */
jk-jeon commented 2 years ago

Thanks @rpopescu for your interest on Dragonbox. What @Alcaro is saying is correct.

The problem is that decimal.significand * pow(10.0, decimal.exponent) doesn't give you the correct result, because

As @Alcaro suggests, you should use either a correct string-to-double parser or a big number library to compute the result with no loss of precision.

Or, you can manually check the results, for example by asking Wolfram Alpha to compute the correct results, like (1) typing something like binary(1988012400000122 * 10^-8), (2) obtain the binary significand and the binary exponent, and (3) compare them to your original number.

Or, you could just write 1988012400000122e-8 to your source code to let your compiler to do the conversion. If the compiler is not buggy, it will give you the correct double.

rpopescu commented 2 years ago

Right, I got it finally - as I noted above, the precision is lost sometimes (because the reasons you've enumerated), but this can be avoided by using e.g. gmplib, as strtod does. Thank you everyone. @jk-jeon perhaps a note of caution regarding achieving the roundtrip in the documentation would be useful, to keep others as naive as me from assuming the basic multiplication would work.

jk-jeon commented 2 years ago

See this for example:

https://godbolt.org/z/heoGqaGrT

jk-jeon commented 2 years ago

@jk-jeon perhaps a note of caution regarding achieving the roundtrip in the documentation would be useful, to keep other from assuming the basic multiplication would work.

Okay I'll consider adding it.

rpopescu commented 2 years ago

Thanks again. Actually, one more question @jk-jeon - what is the source of error when inverting the result when the exponent is negative? From what I can see, this results in packed xor; below is the code for the function double f(double d) { return -d; }

.LCPI0_0:
        .quad   0x8000000000000000              # double -0
        .quad   0x8000000000000000              # double -0
f(double):                                  # @f(double)
        xorps   xmm0, xmmword ptr [rip + .LCPI0_0]
        ret
jk-jeon commented 2 years ago

I thought by "inverting" you mean taking the reciprocal. Did you mean inverting the sign?

rpopescu commented 2 years ago

Yes, that’s what I was doing, negating the value if is_negative, which from my understanding of the representation would simply flip a bit, right? Anyway thanks again for the clarification!

Alcaro commented 2 years ago

Yes, as you can see in that assembly snippet, negating a float or double is just a bit flip.

rpopescu commented 2 years ago

Yes, as I said it’s a bit flip and hence my confusion about the possible precision loss, but it’s been clarified. Thanks for the heads up - replied by email and signature got added :-)

jk-jeon commented 2 years ago

Addressed in 1c7596b1c77eeeb3e656b8a8ba7356e013668e0e.