Floating point std::to_chars has a pretty large code size impact

smeenai commented 1 year ago

I've measured Ryu to increase the size of libc++ by 163 KiB even when optimizing for size. A lot of that comes from the precomputed tables. That's a substantial size increase for mobile apps.

Some ideas that could help (that I haven't measured yet):

Ryu supports RYU_OPTIMIZE_SIZE to shrink the size of the tables, which we should be able to port over to libc++'s implementation.
This would be a much larger change, but fmtlib uses dragonbox instead of Ryu, which claims to be faster and seems to be smaller as well.

efriedma-quic commented 1 year ago

If you're on a target where you have to link against an external libc anyway (which I think includes most mobile apps), the conversion method with the smallest codesize is going to be just "call snprintf".

mordante commented 1 year ago

@efriedma-quic The code is in the libc++ dylib and currently there is no way to avoid building that. So when using libc's printf you still get these tables. The other option was to add these tables in the headers but that has a compilation overhead.

Looking into to_char (and from_chars) for floating point is on my todo list. Another issue is that currently long double is handled like a double. (The current code has been provided by MSVC STL and on Windows long double and double both are a 64-bit floating-point value.)

Looking at dragonbox is indeed on my radar. I saw a post by @jk-jeon a few weeks ago where there were more table size optimizations. I have other things higher on my priority list so it might take some time before I can look into this.

efriedma-quic commented 1 year ago

I meant, you could reimplement to_chars on top of snprintf.

jk-jeon commented 1 year ago

@mordante Thanks for having an interest in my work. Indeed Dragonbox uses a smaller table than (the reference implementation of) Ryu when both are configured to optimize the table size, and there are two reasons:

Ryu has a bigger full table to begin with, which is because the library also provides some parsing functionality that requires some additional entries. These entries can be removed if only the formatting part is needed, but I do not know exactly what should be modified in the Ryu source repo to make that work correctly.
As far as I understand, the way Ryu compresses its full table is to store only one table entry per each twenty-something entries, and interpolate the missing entries in runtime by multiplying an appropriate power of 5. But this multiplication itself is not enough to perfectly reproduce the missing entries, and so it simply stores the resulting error (which is guaranteed to be within 1 bit or something like that) into another table and then add them back in runtime. Dragonbox is no different from it, except that I proved that this last step of adding the error back is in fact not needed, and there is no problem even if we use a slightly off reproduced entries. So I removed this "error offset" table. Probably Ryu also doesn't need it I guess, but so far apparently no one bothered to prove that and make a PR.

@efriedma-quic But can you? How do you find the shortest roundtrippable output using snprintf?

philnik777 commented 1 year ago

@jk-jeon thanks for the information! You don't happen to be willing to contribute your algorithm to libc++? I know it's a lot to ask, but you have probably a way better understanding of these kinds of algorithms than any of the regular contributors. We are also still missing to_chars for long double, as @mordante already pointed out.

I think it could make sense to have a configure-time option to change whether we want a compressed table or not. Though that would mean another configuration for potentially not that much benefit, depending on how much size decrease there actually is when switching to a compressed table.

jk-jeon commented 1 year ago

First of all, there was a misunderstanding from my side which should be corrected.

The vast majority (102kb) of the mentioned 163kb is the table used by Ryu-printf, which is an algorithm for fixed-precision formatting, and is a very different algorithm from Ryu, which is for shortest roundtrippable formatting. Their reference implementations happen to be in the same repo though.

I thought the OP was talking about Ryu, but Ryu only uses 10kb-ish table. Unfortunately, the size optimization option provided in Ryu, as far as I know, only compresses this 10kb-ish table so if Ryu-printf is the problem then it's useless. Honestly, I expect that compressing Ryu-printf table will be much harder than Ryu due to how it's designed.

The Dragonbox repo does not contain an implementation of any algorithm for fixed-precision formatting, and now I realize that probably @mordante was talking about this one rather than Dragonbox, and I also now realize that, yes, you can replace Ryu-printf with snprintf if the size is a concern. Sorry about the confusion @efriedma-quic.

@philnik777 Now, regarding the contribution, it would be my great pleasure, but maybe not in the near future. I deeply appreciate your offer though.

I think these floating-point formatting/parsing things are best done with coherently designed set of algorithms to avoid pointless duplication of giant tables and other implementation details. So my plan is to first work on a more full-fledged library that basically covers all options for std::to_chars/std::from_chars, and then try to advertise its adoption into stdlib impl's. And due to other life issues I cannot be sure when that can be materialized.

In case anyone is interested, there is an ongoing effort by @mborland and others of making such a library: https://github.com/cppalliance/charconv

As far as I know, the floating-point formatting part of this library is based on my Dragonbox and the other algorithm mentioned above.

mordante commented 1 year ago

@efriedma-quic as mentioned by others, printf doesn't have a shortest roundtrippable output. Writing this is not trivial. This feature will be used in more library facilities, for example P2587R3 to_string or not to_string.

@jk-jeon I've read multiple of your posts.

I think these floating-point formatting/parsing things are best done with coherently designed set of algorithms to avoid pointless duplication of giant tables and other implementation details. So my plan is to first work on a more full-fledged library that basically covers all options for std::to_chars/std::from_chars, and then try to advertise its adoption into stdlib impl's. And due to other life issues I cannot be sure when that can be materialized.

I love this idea! libc++ doesn't have from_chars for floating-point values. If you can make this code also available under the LLVM license I would be very interested to use this in libc++. If there are ways to assist with this work I'd be happy to help.

Do you know which algorithm you want to use for from_chars? One of your own or something based on an existing algorithm like Daniel Lemire's work?

efriedma-quic commented 1 year ago

printf doesn't have a shortest roundtrippable output. Writing this is not trivial

I'll admit I wasn't really thinking about that part.

That said, I think I remember reading somewhere about an algorithm for truncating a number to the correct precision using decimal arithmetic... don't remember where I saw it, though.

smeenai commented 1 year ago

First of all, there was a misunderstanding from my side which should be corrected.

The vast majority (102kb) of the mentioned 163kb is the table used by Ryu-printf, which is an algorithm for fixed-precision formatting, and is a very different algorithm from Ryu, which is for shortest roundtrippable formatting. Their reference implementations happen to be in the same repo though.

I thought the OP was talking about Ryu, but Ryu only uses 10kb-ish table. Unfortunately, the size optimization option provided in Ryu, as far as I know, only compresses this 10kb-ish table so if Ryu-printf is the problem then it's useless. Honestly, I expect that compressing Ryu-printf table will be much harder than Ryu due to how it's designed.

Ah, this was a misunderstanding on my side. You're right; the large tables are POW10_SPLIT_2 and POW10_SPLIT, and there's no compressed equivalents.

jk-jeon commented 1 year ago

I love this idea! libc++ doesn't have from_chars for floating-point values. If you can make this code also available under the LLVM license I would be very interested to use this in libc++. If there are ways to assist with this work I'd be happy to help.

Thanks for your kind offer. I'll let you know if I got anything you can help with.

Do you know which algorithm you want to use for from_chars? One of your own or something based on an existing algorithm like Daniel Lemire's work?

Obviously, I cannot just copy-and-paste Eisel-Lemire, because then I cannot recycle the Dragonbox table. Also I feel like I once upon a time figured that some aspects of Eisel-Lemire is not quite optimal and maybe I can make it better, though I forgot about the details. So probably I will just write my own algorithm based on my understanding of the stuffs, and will not try to strictly mimic Eisel-Lemire. I do not know if the resulting algorithm will be just a minor variation of Eisel-Lemire, or a bit more novel than that.

mborland commented 1 year ago

I love this idea! libc++ doesn't have from_chars for floating-point values. If you can make this code also available under the LLVM license I would be very interested to use this in libc++. If there are ways to assist with this work I'd be happy to help.

As @jk-jeon said I am nearing completion on the development of a charconv library for boost here: https://github.com/cppalliance/charconv. It uses a number of his algorithms, and Lemire's. It is all licensed under the Boost Software License which should be compatible with the LLVM license. If I need to dual-license it in order for you to use it that should be straightforward as well.

llvm / llvm-project

Floating point std::to_chars has a pretty large code size impact #64180