janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.43k stars 221 forks source link

Use `snprintf(..., "%.17g", x)` to pretty print numbers. #1306

Closed otrho closed 10 months ago

otrho commented 10 months ago

Issue.

I was a bit confused when I came across the following error:

Janet 1.31.0-meson freebsd/x64/clang - '(doc)' for help
repl:1:> (put (array/new 1000000) 123456.5 1)
error: expected integer key for array in range [0, 2147483646), got 123456
  in _thunk [repl] (tailcall) on line 1, column 1

I didn't know that my index (123456.5 in the repro) was a double, and the error is saying it expects an integer, and it got 123456 which looks very much like an integer.

Solution?

The problem seems to be the choice in using %g in number_to_string() in pp.c. If we use %.17g instead it will always print all the significant digits for a double with a 53-bit mantissa, and never round to what looks like an integer.

I noticed below in pp.c, in print_jdn_one() there is already a %.17g in use.

Here's a handy little snippet to help illustrate:

#include <stdio.h>
#include <math.h>

int main() {
    for (int i = 3; i < 8; i++) {
        double a = pow(10.0, (double)i);
        double b = a + 0.5;
        printf("%f -- %%.0f %.0f %%g %g %%.17g %.17g\n", a, a, a, a);
        printf("%f -- %%.0f %.0f %%g %g %%.17g %.17g %s\n", b, b, b, b, i == 5 ? " <--- HERE" : "");
    }

    // c is 792594609605189126649, too large to fit in a 53-bit double mantissa.
    double c = pow(123.0, 10.0);
    printf("%f -- %.0f %g %.17g\n", c, c, c, c);
    return 0;
}
: cc testing.c -lm && ./a.out
1000.000000 -- %.0f 1000 %g 1000 %.17g 1000
1000.500000 -- %.0f 1000 %g 1000.5 %.17g 1000.5
10000.000000 -- %.0f 10000 %g 10000 %.17g 10000
10000.500000 -- %.0f 10000 %g 10000.5 %.17g 10000.5
100000.000000 -- %.0f 100000 %g 100000 %.17g 100000
100000.500000 -- %.0f 100000 %g 100000 %.17g 100000.5  <--- HERE
1000000.000000 -- %.0f 1000000 %g 1e+06 %.17g 1000000
1000000.500000 -- %.0f 1000000 %g 1e+06 %.17g 1000000.5
10000000.000000 -- %.0f 10000000 %g 1e+07 %.17g 10000000
10000000.500000 -- %.0f 10000000 %g 1e+07 %.17g 10000000.5
792594609605189173248.000000 -- 792594609605189173248 7.92595e+20 7.9259460960518917e+20

At the line marked 'HERE' you can see the %g is rounding to 6 significant digits and dropping the fractional part. Annoying.

Caveat

I didn't add any new testing for it, not sure it needs it. I also don't have access to a Windows box and couldn't test it myself, to check if %.17g works as expected there.

bakpakin commented 10 months ago

I do remember there was a reason I had it set to 6 decimal places by default instead of the usual hack for round tripping with %.17g, but I admit I don't recall. There is also the %j format specifier that will use %.17g as suggested here.

bakpakin commented 10 months ago

Anyway, lgtm. I think the reason was that there were issues in the accuracy of code in older versions of glibc with snprintf - atof and snprintf were not perfect inverses of each other resulting in entering 1.1 resulting in extra garbage in the repl.

EDIT: Languages like python I think package their own snprintf. Janet has its own strtod (for custom bases besides 10), but not printf.

crose@bettybox:~$ janet
Janet 1.31.0-4139e426 linux/x64/gcc - '(doc)' for help
repl:1:> (setdyn *pretty-format* "%j")
"%j"
repl:2:> 1.1
1.1000000000000001
repl:3:>

Which is annoying to say the least. I have checked with a few tools and the "error" seems to be in snprintf. Error in quotes because I think the inaccuracy is probably allowed by the C standard.

otrho commented 10 months ago

Anyway, lgtm. I think the reason was that there were issues in the accuracy of code in older versions of glibc with snprintf - atof and snprintf were not perfect inverses of each other resulting in entering 1.1 resulting in extra garbage in the repl.

Yar, the pretty printing seems to be used quite a bit all over the place, including the repl of course, so I wasn't sure if it was going to break some assumptions. But either way, via some other edge case handling if need be, IMO saying '123456' isn't an integer is confusing.

otrho commented 10 months ago

Fixed by f18ad36b1bd22a3b2a0888472b46bed9c252e65e instead.