ulfjack / ryu

Converts floating point numbers to decimal strings
Apache License 2.0
1.19k stars 99 forks source link

Improve control over the output format #27

Open HeroicKatora opened 6 years ago

HeroicKatora commented 6 years ago

Would it be possible to control whether the number is printed in exponent or decimal notation? Or, if more convenient, return the chosen location for the decimal point, if present, the number of digits as well as the exponent. And make it possible to get the chosen exponent as integer.

I'm trying to integrate this into a formatting library and as such would like to transform the output into all of the forms available for printf(). In particular, the decimal formatting with user provided precision makes it necessary to reformat the number heavily. While some are trivial to implement, reparsing the exponent to decide on the automatic formatting and moving the floating point separator feels suboptimal.

The double-conversion interfaces could be a guideline here but the defaults and best options are quite specialized to ECMAScript.

StephanTLavavej commented 6 years ago

Note that Ryu is inherently incapable of emitting arbitrary precision. You could modify the algorithm to emit additional digits in scientific or fixed notation, but after running out of the digits that are currently trimmed to achieve the shortest round-trip representation, you'd have to fill with zeroes. While it would still round-trip, it would be mathematically incorrect. Consider this example:

C:\Temp>type precision.cpp
#include <stdio.h>

int main() {
    const double two_40 = 1LL << 40;
    const double two_minus40 = 1 / two_40;

    printf("%.27e from printf\n", two_minus40);
    puts("9.094947017729282379150390625e-13 expected for %.27e");
    printf("%.40f from printf\n", two_minus40);
    puts("0.0000000000009094947017729282379150390625 expected for %.40f");
}

C:\Temp>cl /EHsc /nologo /W4 /MTd precision.cpp
precision.cpp

C:\Temp>precision
9.094947017729282379150390625e-13 from printf
9.094947017729282379150390625e-13 expected for %.27e
0.0000000000009094947017729282379150390625 from printf
0.0000000000009094947017729282379150390625 expected for %.40f

Ryu emits 9.094947017729282E-13 for 2^-40.

As part of implementing C++17 , I'll need to modify Ryu to emit shortest round-trip in fixed notation, but I am not yet sure how to contribute that upstream (I need to implement two more variants that switch between fixed and scientific depending on either printf's rules or an overall shortest criterion, so I need to figure out what the interface will be). I might end up separating the algorithm into two parts - the core part that generates the digits in a uint32_t/uint64_t and the exponent, and then a formatting part, which should allow all four charconv formats to be cleanly implemented.

HeroicKatora commented 6 years ago

Outputting trailing zeroes is fully within the specifications of printf as far as I am aware, basically guaranteeing roundtrip is the only necessity. [As far as I am concerned this is acceptable for my use case] Padding with zeroes is also the current strategy. But generating decimal representation from scientific notation can involve a memory move (at most a few bytes but still) of the complete suffix to make space for preceding zero digits. Together with the necessity of reevaluating the exponent, this is quite an overhead over what I imagine would be more straightforward to do in the internal representation.

As part of implementing C++17 ,

That is awesome. Finally hope for efficient string conversions in the standard.

Edit:

Although the wording is not extremely precise, it appears to imply to print more digits than significant digits available in the source floating point value.

The value is rounded to the appropriate number of digits.

  • 7.21.6.1.8 if the number of significant decimal digits is at most DECIMAL_DIG, then the result should be correctly rounded. If the number of significant decimal digits is more than DECIMAL_DIG but the source value is exactly representable with DECIMAL_DIG digits, then the result should be an exact representation with trailing zeros
  • 7.21.6.1.13
ulfjack commented 6 years ago

Additional output formats would definitely be welcome, with the mentioned caveat.

ulfjack commented 6 years ago

Marking this as specific to the C implementation. If someone is interested in special output formats for Java, please file a separate issue.

StephanTLavavej commented 6 years ago

I have code for fixed notation and I should have time to polish it up and submit a pull request in October.

StephanTLavavej commented 5 years ago

Here's the fixed notation code that I wrote for VS 2017 15.9 (slightly revised): https://github.com/StephanTLavavej/ryu/blob/msvc-2018.10.22/ryu/d2s.c#L388

It isn't ready for a pull request yet. Outstanding issues:

If this is interesting, I can continue to work on it after dealing with other things on my plate.

ulfjack commented 5 years ago

I'm definitely interested in seeing some progress here. Unfortunately, I haven't had any time to work on this.

Artoria2e5 commented 3 years ago

Just to fix a dead link: https://github.com/StephanTLavavej/ryu/blob/bb357f7/ryu/d2s.c#L317.

See https://reviews.llvm.org/D70631 for Steph's LLVM PR based on it (but updated with ryu printf); the PR implements %g precision by post-processing. A lookup table is used to skip the reformatting.

Personally I would like to see %g become a thing too.