Open alugowski opened 1 year ago
Thank you so much for your kind words!
Is this intentional?
Yes. The intention is to match what the Ryu library spits out (which is always in the scientific form with E
as the exponent marker).
I know this is probably not what people usually want. It might be more useful if to_chars
outputs the string with the smallest number of characters (while preferring the fixed-point form for tie), but there are some reasons why I didn't do so.
to_chars
implementation from the Ryu repo, as it was faster than the one I could write by myself at that time, and also by doing so I could more fairly compare two algorithms. Though, at some point, I completely rewrote the to_chars
implementation from the scratch and it's now very different from Ryu's.to_decimal
rather than on to_chars
. The reason is that I don't believe there is the single right answer on how the output string should look like. For instance, you suggested that 1.0
is a better output than 1E0
, but why not just 1
? I'm pretty sure some people will prefer 1.0
, while some other will prefer 1
. Maybe some people want 1,0
, or maybe 1e0
, or maybe 1.0e0
. There are just too many possible ways to do it and I don't have the ability to accommodate all possible scenarios that I can ever imagine and optimize all of them. The main purpose of providing to_chars
is to merely prove that a fast to_chars
implementation is possible with the provided to_decimal
, and also to demonstrate how that might be done. My goal was to provide a to_decimal
implementation, so that anyone who need a fast to_chars
can leverage it to write their own to_chars
optimized for their own use. To my understanding, that is indeed the way this library is being used by several other projects.I want to also say that writing your own to_chars
is not a devastatingly difficult job, if your goal is not to deliver an absolutely amazing performance out of it. I can also help you if you want to write one. FYI, the implementation here is based on the idea explained here. A more refined analysis is done in the appendix of this another post.
Nevertheless, it is very much welcome if anybody comes up with a generic mechanism for specifying the formatting details, and I would be even happier if anyone opens a PR for that, but right now I have no plan for doing so by myself.
@alugowski It it helps, the C++ fmt library uses the Dragonbox algorithm under the hood, and you can control its output exactly the way you want.
Ah right. I forgot to mention that. Thanks @ecorm.
Thank you for the detailed explanation!
My question came from the sentence in the README that says the "output is of the shortest length". So I was expecting to see "1" instead of "1E0".
I actually came to find this project by looking for alternatives to std::to_chars
. As I'm sure you're aware, compiler support for the floating-point versions of bothstd::from_chars
and std::to_chars
is spotty at best. AFAIK VisualStudio has it, but the current versions of GCC and clang do not. The alternative for std::from_chars
is the excellent fast_float
library. They have implemented a fast parser and exposed it as something that implements the C++17 spec (in another namespace of course). Looks like the GCC 12 version of std::from_chars
will simply be fast_float
.
Since I need both writing and reading, I found Dragonbox as a usable analog of fast_float
on the writing side. I don't have any ambitions of writing my own methods. fmt
would work, though it's larger than my entire project for only this one method.
I don't know what your ambitions are, but I think if you're interested then both Clang and GCC could use your work.
Regarding your "everyone wants something else" point, that true. The std::to_chars
standard has options for everything you've mentioned. If you're interested I think it would be easy to implement on top of what you already have, and would make Dragonbox a proper substitute for the large number of folks looking for alternatives not for performance reasons but just for basic support.
https://en.cppreference.com/w/cpp/utility/to_chars
Now don't get me started on long double
support :P You're largely stuck with the C methods if you need that.
My question came from the sentence in the README that says the "output is of the shortest length". So I was expecting to see "1" instead of "1E0".
In fact, if you look carefully at what it says:
- The output is of the shortest length; that is, no other output strings that are interpreted as the input number can contain less number of significand digits than the output of Dragonbox.
So the shortness is in terms of the number of (significand) digits, not in terms of the number of characters. I mean, it's confusing I admit, but the exact number of characters is not the most interesting detail from the point of view of developing a conversion algorithm.
Since I need both writing and reading, I found Dragonbox as a usable analog of fast_float on the writing side. I don't have any ambitions of writing my own methods. fmt would work, though it's larger than my entire project for only this one method.
If fmt
is too bulky, then there is nanofmt
which also uses Dragonbox under the hood. I presume this one is probably lighter than fmt
. Also there is Alexander Bolz' implementation (https://github.com/abolz/Drachennest) which IIRC produces prettier outputs. A small problem of these two is that their Dragonbox is a bit outdated because it has been improved since they copied the implementation from this repo. But that should not be a serious issue if your goal is not to win a competitive benchmark. (Also IIRC Alexander's implementation only supports double
; but he has a Schubfach implementation for float
instead.)
I don't know what your ambitions are, but I think if you're interested then both Clang and GCC could use your work.
As far as I know, Dragonbox has been considered for libc++ implementation of std::to_chars
, but eventually dropped in favor of Ryu, which already had a working adoption thanks to Mr. STL's hard work.
Adoption into the standard library is something I ultimately want for, and this was an attempt for preparing that. But I was way too ambitious and could not really afford the required amount of time and effort, so that project is "dead" at this point. I'm very slowly making some progress (e.g. developing this) though.
Now don't get me started on long double support :P You're largely stuck with the C methods if you need that.
This is also something in my TODO list. May take long to be realized.
Regarding your "everyone wants something else" point, that true. The
std::to_chars
standard has options for everything you've mentioned.
IIRC std::to_chars
is not that versatile. I don' t think e.g. it has an option for the mandatory trailing zero .0
. But yeah, probably your point is that std::to_chars
would be enough for many people while they are not happy about the ugly 1E0
.
If you're interested I think it would be easy to implement on top of what you already have, and would make Dragonbox a proper substitute for the large number of folks looking for alternatives not for performance reasons but just for basic support.
Simply put, the reason why I'm sort of hesitant in writing some more useful version of to_chars
is because I'm not comfortable with providing a very suboptimal implementation in this "supposed to be fast" library. Optimizing and testing require a lot of effort and currently I don't have enough resource to put for them.
I would say this again: it's not super daunting to implement your own to_chars
given that to_decimal
already does all the hard work. Not-so-fast-but-working implementation might not take more than 80 lines I guess. You may refer to https://en.cppreference.com/w/cpp/io/c/fprintf, the row for the general format (g
and G
) to get an idea on how to mimic that behavior. (It doesn't give the shortest output I guess but I think you may not need absolutely shortest string either.)
Well I don't know. I may write a shitty one if I get some time and post it here.
Hmm. I guess I somewhat sounded like a jerk 😅 Sorry about that.
I think your suggestion about providing an alternative interface doing std::to_chars
-like formatting would be a nice addition. I'll consider that for the next release. Thanks for the input!
By the way, it won't be a drop-in replacement for std::to_chars
because this repo has no plan for supporting printf
-style fixed-precision formatting as well as hexfloat formatting. Both have nothing to do with what Dragonbox does afaict so they are out of scope, and especially the first one is very difficult to do correctly.
Great package! Thank you!
I've noticed that
to_chars
sometimes emits an extraE0
suffix when it's not needed. For example, the number1.0
is emitted as1E0
.Is this intentional?