Improve control over the output format

HeroicKatora commented 6 years ago

Would it be possible to control whether the number is printed in exponent or decimal notation? Or, if more convenient, return the chosen location for the decimal point, if present, the number of digits as well as the exponent. And make it possible to get the chosen exponent as integer.

I'm trying to integrate this into a formatting library and as such would like to transform the output into all of the forms available for printf(). In particular, the decimal formatting with user provided precision makes it necessary to reformat the number heavily. While some are trivial to implement, reparsing the exponent to decide on the automatic formatting and moving the floating point separator feels suboptimal.

The double-conversion interfaces could be a guideline here but the defaults and best options are quite specialized to ECMAScript.

StephanTLavavej commented 6 years ago

Note that Ryu is inherently incapable of emitting arbitrary precision. You could modify the algorithm to emit additional digits in scientific or fixed notation, but after running out of the digits that are currently trimmed to achieve the shortest round-trip representation, you'd have to fill with zeroes. While it would still round-trip, it would be mathematically incorrect. Consider this example:

C:\Temp>type precision.cpp
#include <stdio.h>

int main() {
    const double two_40 = 1LL << 40;
    const double two_minus40 = 1 / two_40;

    printf("%.27e from printf\n", two_minus40);
    puts("9.094947017729282379150390625e-13 expected for %.27e");
    printf("%.40f from printf\n", two_minus40);
    puts("0.0000000000009094947017729282379150390625 expected for %.40f");
}

C:\Temp>cl /EHsc /nologo /W4 /MTd precision.cpp
precision.cpp

C:\Temp>precision
9.094947017729282379150390625e-13 from printf
9.094947017729282379150390625e-13 expected for %.27e
0.0000000000009094947017729282379150390625 from printf
0.0000000000009094947017729282379150390625 expected for %.40f

Ryu emits 9.094947017729282E-13 for 2^-40.

As part of implementing C++17 , I'll need to modify Ryu to emit shortest round-trip in fixed notation, but I am not yet sure how to contribute that upstream (I need to implement two more variants that switch between fixed and scientific depending on either printf's rules or an overall shortest criterion, so I need to figure out what the interface will be). I might end up separating the algorithm into two parts - the core part that generates the digits in a uint32_t/uint64_t and the exponent, and then a formatting part, which should allow all four charconv formats to be cleanly implemented.

HeroicKatora commented 6 years ago

~~Outputting trailing zeroes is fully within the specifications of printf as far as I am aware, basically guaranteeing roundtrip is the only necessity.~~ [As far as I am concerned this is acceptable for my use case] Padding with zeroes is also the current strategy. But generating decimal representation from scientific notation can involve a memory move (at most a few bytes but still) of the complete suffix to make space for preceding zero digits. Together with the necessity of reevaluating the exponent, this is quite an overhead over what I imagine would be more straightforward to do in the internal representation.

As part of implementing C++17 ,

That is awesome. Finally hope for efficient string conversions in the standard.

Edit:

Although the wording is not extremely precise, it appears to imply to print more digits than significant digits available in the source floating point value.

The value is rounded to the appropriate number of digits.

7.21.6.1.8 if the number of significant decimal digits is at most DECIMAL_DIG, then the result should be correctly rounded. If the number of significant decimal digits is more than DECIMAL_DIG but the source value is exactly representable with DECIMAL_DIG digits, then the result should be an exact representation with trailing zeros

7.21.6.1.13

ulfjack commented 6 years ago

Additional output formats would definitely be welcome, with the mentioned caveat.

ulfjack commented 6 years ago

Marking this as specific to the C implementation. If someone is interested in special output formats for Java, please file a separate issue.

StephanTLavavej commented 6 years ago

I have code for fixed notation and I should have time to polish it up and submit a pull request in October.

StephanTLavavej commented 5 years ago

Here's the fixed notation code that I wrote for VS 2017 15.9 (slightly revised): https://github.com/StephanTLavavej/ryu/blob/msvc-2018.10.22/ryu/d2s.c#L388

It isn't ready for a pull request yet. Outstanding issues:

Mechanical: I wrote this after __uglifying Ryu's identifiers. (New identifiers are MSVC STL _Ugly, making them easy to distinguish.) De-uglifying isn't a problem, it just takes a bit of time.
This uses C++17 charconv's interface: the [_First, _Last) range, chars_format requesting a format, and the bounds-checked to_chars_result. To upstream this, we'll probably need a different interface. (I am very interested in keeping my code closely aligned with upstream, but I still need to ship the charconv interface.)
Also regarding the interface, the fixed notation codepath is currently intertwined with the Ryu scientific notation codepath. Always running Ryu allows us to use its output (suitably decimal/zero filled) for most cases, and even the "large exact integer" case benefits from Ryu's output to determine the output length. I think I could be far less invasive here, and extract this into separate functions. (I wrote this under a deadline, hence the hastily designed structure, although I have a high level of confidence in the logic itself.)
I am currently using the MSVC intrinsics _BitScanForward and _BitScanForward64; replacing them with Clang/GCC intrinsics or portable code shouldn't be difficult.
None of this applies to the generic128 codepaths which I haven't worked with at all.
The digit-printing code is a copy-pasted mess; it may be worth centralizing now.

If this is interesting, I can continue to work on it after dealing with other things on my plate.

ulfjack commented 5 years ago

I'm definitely interested in seeing some progress here. Unfortunately, I haven't had any time to work on this.

Artoria2e5 commented 3 years ago

Just to fix a dead link: https://github.com/StephanTLavavej/ryu/blob/bb357f7/ryu/d2s.c#L317.

See https://reviews.llvm.org/D70631 for Steph's LLVM PR based on it (but updated with ryu printf); the PR implements %g precision by post-processing. A lookup table is used to skip the reformatting.

Personally I would like to see %g become a thing too.

ulfjack / ryu

Improve control over the output format #27