Closed CrustyAuklet closed 2 months ago
Interesting discovery, the std::to_char implementation (at least in GCC) is extremely large! I started working on the tests I am adding to our production pipeline and all of a sudden the tests blew up in size and wouldn't link. I had to erase most tests to get the size down so I could use bloaty and found this:
37.5% 104Ki [1344 Others]
26.4% 73.4Ki (anonymous namespace)::ryu::POW10_SPLIT_2
10.3% 28.7Ki (anonymous namespace)::ryu::POW10_SPLIT
8.6% 24.0Ki [section .stack]
4.7% 13.0Ki [section .text]
2.0% 5.47Ki test_fun_1()
1.8% 5.11Ki (anonymous namespace)::pool_buffer_
1.3% 3.75Ki snitch::tests
0.7% 2.03Ki snitch::impl::(anonymous namespace)::parse_color_options(int, char const* const*)
0.6% 1.80Ki __aeabi_dadd
0.5% 1.50Ki snitch::impl::(anonymous namespace)::expected_args
0.5% 1.40Ki __aeabi_dmul
0.5% 1.33Ki snitch::impl::(anonymous namespace)::parse_arguments(int, char const* const*, snitch::small_vector<snitch::impl::(anonymous namespace)::expected_argument, 32u> const&, snitch::impl::(anonymous namespace)::parser_settings const&)
0.4% 1.09Ki board::i2c_init()
0.4% 1.04Ki (anonymous namespace)::ryu::d2exp_buffered_n(double, unsigned long, char*, int*)
0.4% 1024 seal::core::host_stack_
0.3% 972 (anonymous namespace)::ryu::d2fixed_buffered_n(double, unsigned long, char*)
All the (anonymous namespace)::ryu::*
symbols I didn't recognize so I searched and found this issue.
I've been using the charconv functions and been happy so far, but I never really deal with floating point since all our values are in fixed point representations. So maybe this implies that the flag should be over-ridable.
So if floating point std::to_char
isn't available would you prefer to fall back to your constexpr function, snprintf, or something else?
Attention: Patch coverage is 97.50000%
with 1 lines
in your changes are missing coverage. Please review.
Project coverage is 93.91%. Comparing base (
ccd3afa
) to head (4096a6c
).
So if floating point std::to_char isn't available would you prefer to fall back to your constexpr function, snprintf, or something else?
I think falling back to the constexpr functions is simplest and most portable. The issue with the large size of std::to_chars
is interesting, and makes me wonder if it isn't simpler to use the constexpr functions all the time. That might negatively impact the benchmarks a little, but serialisation is unlikely to be the main bottleneck of test execution.
I'll see if I can rerun the benchmarks with that solution, to test the impact. Thanks for looking into this.
Otherwise, it seems the Windows CI is broken, I don't think that's related to your PRs. I'll investigate separately. Edit: Yes, it's https://github.com/snitch-org/snitch/issues/140 again, with a newer version of MSVC. We need to disable the offending test with that version. I'll make a separate PR.
The CI failures weren't related to this PR and have been fixed in main
.
Otherwise, I have benchmarked using append_constexpr()
to implement append_fast()
, the changes are in the append branch.
Debug: | Before | After | |
---|---|---|---|
Build framework | 3.7s | 3.7s | |
Build tests | 67s | 67s | |
Build all | 70s | 70s | |
Run tests | 42ms | 40ms | |
Library size | 7.6MB | 7.7MB | |
Executable size | 35.7MB | 35.7MB |
Release: | Before | After | |
---|---|---|---|
Build framework | 5.0s | 5.2s | |
Build tests | 142s | 142s | |
Build all | 147s | 147s | |
Run tests | 26ms | 26ms | |
Library size | 1.3MB | 1.3MB | |
Executable size | 10.1MB | 10.1MB |
--> no significant change, so that would be a viable solution.
I think in the end it is your decision but since there is minimal performance difference I would lean towards using the constexpr version so that those functions get more use and implicit testing. At least until C++23 wide support and you could just use to_char for everything integral if you wanted. to_char for floating point seems like a non-starter for embedded at least.
I think in the end it is your decision but since there is minimal performance difference I would lean towards using the constexpr version so that those functions get more use and implicit testing. At least until C++23 wide support and you could just use to_char for everything integral if you wanted. to_char for floating point seems like a non-starter for embedded at least.
I think I'd be happy with either of these two solutions:
append_constexpr
all the time -- simple, small, but not optimal runtime performance.std::to_chars
or append_constexpr
conditionally, using a config toggle like SNITCH_APPEND_TO_CHARS
-- a bit more complex, but lets people opt-out of the cost of std::to_chars
if they don't need the optimal speed (or simply if their compiler/STL can't handle it). I think it's probably better to use one or the other, but not mix the two, so the option is simply "use std::to_chars
or not", rather than "use it for integers all the time but not always for floats".Looking at the performance measurements in #18, I went with option 2. I don't think it's reasonable to give up the 5x performance increase of std::to_chars
in the basic case. For embedded the SNITCH_APPEND_TO_CHARS
option can be set to OFF
to avoid the huge space cost.
I moved the standard library detection macro logic to the config header, and disable the setting if a library version is detected that doesn't support std::to_char
for floating point, or std::to_char
at all, SNITCH_APPEND_TO_CHARS
is undefined and defined as 0. similar to the SNITCH_CONSTEXPR_FLOAT_USE_BITCAST
setting.
I like to restrict macro logic to one location so I re-named the functions in the anonymous namespace to append_to
since they don't necessarily use std::to_char
. I pulled in the append_constexpr changes from the append branch, and made the conversion base a template parameter in append_to
to match those signatures.
The single MacOS failure is a little baffling to me. I think apple clang, in Release mode, is printing out an extra character of precision and therefore the acceptance test regex isn't converting the time to a "*" as expected.
Not sure why this wouldn't happen in debug, or normal clang.. it does look like libc++ doesn't define the feature test macro for std::to_char
even though they have it implemented.
so format will fall back to the constexpr versions.
Not sure why this wouldn't happen in debug, or normal clang.
The offending number that fails the regex (1e-6, it seems) is the duration of a test (in seconds), as measured in the CI environment. That number is likely to be highly variable, so it would not be a surprise if the other runs (e.g. in Debug, or with other OS/compilers) ended up with a different value. If the problem is real (i.e. not a compiler bug) and only affects specific values, then it's possible that the other runs were just lucky and didn't trigger the issue.
And it seems indeed the issue is real, and I can reproduce it on my local branch. Adding the following line to the "append floats"
test triggers a failure if using append_contexpr
:
CONSTEXPR_CHECK(a(1e-6f) == ae{"1.000000e-06"sv, true});
So that's a bug in append_constexpr
. I traced it down to set_precision()
, which didn't account for the number of digits potentially staying the same when doing a divide-and-round (in this case, rounding 99999999/10
to 10000000
). Fix is in 52ff3e8fc1d210dec28a7d829dd435577e73a603.
Looking good! I'm happy to approve it once the small changes requested above are made.
As discussed in #158 , this is a merge request exploring the best option to eliminate the dependence on snprintf, since the capability of printf family of functions may vary depending on implementation. Specifically in the most common embedded ARM compiler newlib-nano won't print large integers.
This change currently goes with option 2, falling back on snitch
append_constexpr()
function for floating point based on standard library version. standard library detection is using macros for libstdc++, libc++, and MSVC. A "cleaner" solution in CMake could be to set a configuration based on a test compile. Not sure how to do the same in meson.Should this have an option to override the detected capability?