microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.25k stars 1.51k forks source link

<charconv>: instead of memcpy'ing the result you can calculate necessary buffer size ahead of time for integers #1024

Open toughengineer opened 4 years ago

toughengineer commented 4 years ago

For integers instead of filling the buffer from right to left and memcpy'ing the result back so the most significant digit (that is written last) ends up in the leftmost position, you can calculate the necessary buffer size ahead of time and put all the digits exactly at their final destination place.

See example in libc++: https://github.com/llvm/llvm-project/blob/f54402b63a4f5b0b4b15e0f82ce8ff8501b206e6/libcxx/include/charconv#L164-L168 straight from the Bit Twiddling Hacks: https://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10

memcpy usage here: https://github.com/microsoft/STL/blob/550713eba27c5803addb6f615d3526400cb2df37/stl/inc/charconv#L137-L146

miscco commented 4 years ago

As far as I know libc++ charconv implementation is derived from the STLs one.

The reason that code is not using the builtins is most likely that their support in MSVC was only merged recently (See #795)

StephanTLavavej commented 4 years ago

Thanks for the performance idea! MSVC's and libc++'s integer <charconv> implementations were independently written, which is why we're lacking this technique.

toughengineer commented 4 years ago

I'm glad to help.

After I submitted this issue I realized that writing to a buffer on the stack and then memcpy'ing to destination buffer may still be faster than calculating the needed length in advance.

The calculation uses log2 as counting bits instruction (which is several clocks on modern CPUs) and some multiplication+shifting+table lookup and fixup.

I tried to ask people who compared different implementations (e.g. to_chars/to_string and that of fmtlib, maybe @vitaut can comment on this) and got no definitive answer. So, I guess, one should investigate if it's really beneficial to spend time calculating the size in advance, and possibly take other considerations into account.

vitaut commented 4 years ago

I ran some benchmarks in http://www.zverovich.net/2020/06/13/fast-int-to-string-revisited.html. Size computation (https://github.com/fmtlib/fmt/blob/60c43e870387affb07de32ed4fc2794ac7a9f60d/include/fmt/format.h#L787) is a relatively insignificant factor compared to other optimizations (see the difference between format_to and format_int).

StephanTLavavej commented 4 years ago

Also, I have considered having different codepaths for "definitely enough space for the worst case" and "maybe not enough space, need to do precise bounds checking".

(Edit: @BillyONeal reminded me that we generate digits backwards, so even when there is plenty of space, we don't immediately know where to begin writing.)

(Edit x2: @BillyONeal also reminded me that Ryu's integer printing generates 2 digits at once, but the integer printing outside of Ryu generates only 1 digit at a time.)