Open BlamKiwi opened 3 months ago
__format_escaped_char
and__format_escaped_string
use abasic_string
instead of just passing through the output iterator. Is there a reason this is the case? Directly passing in the iterator would preventbasic_string
usage where it would otherwise not be used.
See https://reviews.llvm.org/D134036 although I didn't found why a basic_string
buffer is wanted. I guess we can use a much simpler buffer type. CC @mordante.
__format_escaped_char
and__format_escaped_string
use abasic_string
instead of just passing through the output iterator. Is there a reason this is the case? Directly passing in the iterator would preventbasic_string
usage where it would otherwise not be used.See https://reviews.llvm.org/D134036 although I didn't found why a
basic_string
buffer is wanted. I guess we can use a much simpler buffer type. CC @mordante.
I don't recall either and maybe it's no longer/not needed.
I feel this request has merits, however I think we should look at the bigger picture. If no floating-point is available we should not have to_chars
for floating-point either. This code contains several large lookup tables that is not needed.
At the moment I've quite a long list of items I want to work on, so I don't expect to have time to look at this soon.
I feel this request has merits, however I think we should look at the bigger picture. If no floating-point is available we should not have
to_chars
for floating-point either. This code contains several large lookup tables that is not needed.
I agree about the big picture. I think a lot of the requests in this issue make sense. IMO if we want to make progress on this, we would need someone to write a short RFC explaining how we're going to tackle this, what is going to be affected and how, etc.
I'm going to un-assign you for now since you don't think you'll have time to work on this. @BlamKiwi if you have interest for pursuing this, I think that is the most likely way for this issue to make progress.
@mordante Are you suggesting that you would prefer a compile flag to just disable float support entirely in libc++?
I agree that would probably be the useful baseline. The question would then be should disabling just float <format>
be a supported use case?
The pain point is paying for float formatting when you're not actually formatting floats. It is still nice to be able to prototype math with floating point values before moving to something like fixed point arithmetic. I can understand not wanting to support this edge-case though since its pretty niche.
Or to state it more plainly, implement a solution in such a way that supporting floats is orthogonal to picking different code-paths to optimize size.
@ldionne Regarding to what the consequences are to hard disabling float support, I think the C world provides the most useful example. printf
, scanf
etc just don't support float anymore. Things like <cmath>
leads to compile/linker errors. The user experience is a bit poor, since any use of float breaks. Given only embedded/kernel development has this use-case though it's probably fine.
On the C++ side, there's annoying edge cases like std::unordered_map
using floats in its API. These would probably also be disabled or replaced with an extension to the standard. e.g., a fixed point arithmetic value.
Regarding the other minor optimizations, I assume explicit build flags for modifying behaviour are preferred over compiler flags. e.g., Adding something like LIBCXX_FORMAT_OPTIMIZE_SIZE
over using __OPTIMIZE_SIZE__
@BlamKiwi If there is a way to "delay" the instantiation of format for floating point until when we actually need it, then this could be a change we implement while being conforming today, without considering the larger change about floating points. But I'm not certain that's feasible.
Another option worth considering is stuff like link-time optimization. If you want to reduce code size, perhaps this can help dead-strip code paths in format that are never used?
@BlamKiwi If there is a way to "delay" the instantiation of format for floating point until when we actually need it, then this could be a change we implement while being conforming today, without considering the larger change about floating points. But I'm not certain that's feasible.
Another option worth considering is stuff like link-time optimization. If you want to reduce code size, perhaps this can help dead-strip code paths in format that are never used?
We already use LTO where possible. I will take a second look though. Given how simple the other fixes are, it would be good to get an answer for this.
@mordante Are you suggesting that you would prefer a compile flag to just disable float support entirely in libc++?
I agree that would probably be the useful baseline. The question would then be should disabling just float
<format>
be a supported use case?
IMO no. For the question "should we allow libc++ to work without floating-point support" my answer would be yes.
The pain point is paying for float formatting when you're not actually formatting floats. It is still nice to be able to prototype math with floating point values before moving to something like fixed point arithmetic. I can understand not wanting to support this edge-case though since its pretty niche.
Only disabling it for <format>
still means you pay the price for to_chars
and stream operations. (The latter can be disabled by disabling locale support.) Both to_chars
and stream operations have their implementation in the dylib so even using C++11 will add the overhead of the floating-point tables of to_chars
. So my objection is purely about adding a special case for <format>
and not disable floating-point in general or at least all "floating-point conversion functions" feels wrong to me.
dylib
I think this is where the misunderstanding is coming from. Embedded work largely uses LTO, -ffunction-sections
and static linking. We are not paying these costs because we are not manually invoking those functions, and they're discarded by the linker. Our codebases don't use streams or std::to_chars<float>
already, precisely because of their costs.
<format>
by its current implementation pulls in a bunch of stuff it doesn't actually need under a static linking environment. One of those things happens to be float formatting because of the implementation of std::basic_format_arg
.
We are not paying these costs because we are not manually invoking those functions, and they're discarded by the linker. Our codebases don't use streams or
std::to_chars<float>
already, precisely because of their costs.
Here you describe your use-case and the feature you describe perfectly fits your use-case. However in general we try to add features that are usable for a larger set of users. That's why I don't want to add a special case for your use-case. The next user might want to use shared objects and ostreams but without floating-point support.
That's fine. What I'm wanting to pursue is an improvement to <format>
code-gen that will help all users using libc++ in a space constrained environment. Floating point formatting is just our particular pain point. Users on 32-bit space constrained systems might also balk at the fact 64-bit integer formatting is also pulled in. We actually do have some targets (Xtensa) with this constraint. You already have flags to do similar things like Unicode support. I just want buy-in before I go put together a merge request.
The main issue with your current implementation is you defer std::basic_format_arg
format codegen until you're actually inside __handle_replacement_field
. This seems to prevent the compiler/linker from knowing that particular code paths aren't actually used, especially at lower optimization levels.
I need to investigate some of the code output a bit more, but it should be possible (in a standards compliance sense) to add a private field to std::basic_format_arg
which is just a type erased format function of the current argument type. This eliminates the visitor pattern usage inside __handle_replacement_field
and should optimize the code-gen for minimizing size. This would fix our specific floating-point gripe, but this optimization would apply to all types listed in the standard.
This change would trade-off some runtime overhead for minimizing code size, so I would guard this implementation with a build flag. Something along the lines of LIBCXX_FORMAT_MINIMIZE_SIZE
. The main issue with this change though is the resulting binary wouldn't be ABI compatible with a library built without this flag.
EDIT: I'm not saying that removing float support from libc++ as an explicit flag is a bad idea. More that improving codegen for <format>
is worth doing on its own.
I agree if you have a way to improve the codegen of <format>
I'd be happy to review it. There are some benchmarks for std::format
which allows to determine what the performance overhead of such a solution would be.
See the associated merge request for the main code size optimization. The overhead of using pointer dispatch is ~5% on my laptop. 5% could be significant for some users, so it's probably worth guarding this with a build flag. The benefit is that only formatters actually used get instantiated.
I still need to look at the escape functions to get rid of the basic_string usage. Unfortunately the only practical way to remove the use of a temporary buffer would be to iterate over the string twice. Once to count the escaped printable string width and the second time to actually escape and pad the string.
We currently use an internal format library that's a "lean" implementation of the standard for embedded devices. I've been trying to test libc++
<format>
on embedded systems with very little RAM. There's some tweaks that can be made to make libc++ a bit leaner.The C++ standard specifies that
float
,double
andlong double
are explicitly listed types instd::basic_format_arg
. This causes the compiler to pull in all float formatting code and associated lookup tables. This can cause binaries to bloat by >200KB even though it's never used. It is a common feature of embedded libc implementations to provide a compile option to disable float support in printf/scanf. It would be nice if libc++ provided a build flag to disable float types in<format>
or somehow map floats to thehandle
type to implement a compiler firewall.<charconv>
uses lookup tables or dedicated code paths to efficiently convert base 10, base 2, etc. It would be good if optimized codepaths were guarded by__OPTIMIZE_SIZE__
(or equivalent libc++ define) to prevent bringing in this code.__throw_invalid_option_format_error
and__throw_invalid_type_format_error
usebasic_string
and causes the contents ofstring.cpp
to be pulled in. If exceptions are disabled, it would be nice if this message was simplified to a simple c string.__format_escaped_char
and__format_escaped_string
use abasic_string
instead of just passing through the output iterator. Is there a reason this is the case? Directly passing in the iterator would preventbasic_string
usage where it would otherwise not be used.