llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.23k stars 11.65k forks source link

[libc++] int-to-string conversions significantly slower on mingw #56202

Open alvinhochun opened 2 years ago

alvinhochun commented 2 years ago

This is the same issue as https://github.com/llvm/llvm-project/issues/40476 but specifically for mingw. The prior issue was fixed for MSVC with https://reviews.llvm.org/D59525 and the follow-up https://reviews.llvm.org/D59727, but mingw still uses the slow path requiring __libcpp_locale_guard.

I don't have proper benchmark data now, but converting numbers to strings using std::ostringstream is apparently quite slow on mingw. Especially when the current locale is not "C", the setlocale call takes the majority of time (see image below). When the locale is already "C" it seems about a magnitude faster but setlocale is still quite significant in profiling data.

profiling

(This is the code in use.)

The toolchain we use is llvm-mingw 20220323 (LLVM 14.0.0).

alvinhochun commented 2 years ago

Tried making a quick benchmark with nanobench:

Benchmark code ```c++ #include #include #include #include #define ANKERL_NANOBENCH_IMPLEMENT #include "nanobench.h" #if defined(_MSC_VER) # define NOINLINE __declspec(noinline) #else # define NOINLINE __attribute__(( noinline )) #endif NOINLINE int getNumber() { return 1234; } NOINLINE double getDouble() { return 1234.56; } int main() { ankerl::nanobench::Bench b; b.minEpochTime(std::chrono::milliseconds(200)); b.maxEpochTime(std::chrono::milliseconds(1000)); b.performanceCounters(true); b.run("[ref] osstream with filler string", [&] { std::ostringstream os; os << "xxxxxxxxxx"; ankerl::nanobench::doNotOptimizeAway(os.str()); }); setlocale(LC_ALL, "en_US.utf8"); b.run("osstream << int (not C locale)", [&] { int x = getNumber(); std::ostringstream os; os << x; ankerl::nanobench::doNotOptimizeAway(os.str()); }); setlocale(LC_ALL, "C"); b.run("osstream << int (is C locale)", [&] { int x = getNumber(); std::ostringstream os; os << x; ankerl::nanobench::doNotOptimizeAway(os.str()); }); b.run("std::to_string(int)", [&] { int x = getNumber(); auto s = std::to_string(x); ankerl::nanobench::doNotOptimizeAway(s); }); b.run("std::to_chars(int)", [&] { int x = getNumber(); std::array buf; auto r = std::to_chars(buf.data(), buf.data() + buf.size(), x); ankerl::nanobench::doNotOptimizeAway(buf); }); setlocale(LC_ALL, "en_US.utf8"); b.run("osstream << double (not C locale)", [&] { double x = getDouble(); std::ostringstream os; os << x; ankerl::nanobench::doNotOptimizeAway(os.str()); }); setlocale(LC_ALL, "C"); b.run("osstream << double (is C locale)", [&] { double x = getDouble(); std::ostringstream os; os << x; ankerl::nanobench::doNotOptimizeAway(os.str()); }); b.run("std::to_string(double)", [&] { double x = getDouble(); auto s = std::to_string(x); ankerl::nanobench::doNotOptimizeAway(s); }); b.run("std::to_chars(double)", [&] { double x = getDouble(); std::array buf; auto r = std::to_chars(buf.data(), buf.data() + buf.size(), x); ankerl::nanobench::doNotOptimizeAway(buf); }); return 0; } ```

libc++/mingw/ucrt (llvm-mingw-20220323-ucrt-x86_64)

clang++ locale-perf-test.cpp -o locale-perf-test-mingw.exe -O2 -std=c++17

ns/op op/s err% total benchmark
89.72 11,145,303.12 1.8% 2.40 [ref] osstream with filler string
28,449.35 35,150.19 0.6% 2.42 osstream << int (not C locale)
1,045.37 956,599.96 0.8% 2.41 osstream << int (is C locale)
8.12 123,093,432.05 1.6% 2.38 std::to_string(int)
3.18 314,807,231.25 1.6% 2.41 std::to_chars(int)
28,740.35 34,794.28 0.7% 2.42 osstream << double (not C locale)
1,344.94 743,525.48 0.5% 2.32 osstream << double (is C locale)
320.15 3,123,519.84 0.7% 2.42 std::to_string(double)
46.89 21,324,673.41 0.5% 2.43 std::to_chars(double)

libstdc++/mingw/msvcrt (x86_64-11.2.0-release-posix-seh-rt_v9-rev1

g++ locale-perf-test.cpp -o locale-perf-test-mingwgcc.exe -O2

ns/op op/s err% total benchmark
216.24 4,624,530.54 0.8% 2.39 [ref] osstream with filler string
237.27 4,214,571.27 0.9% 2.40 osstream << int (not C locale)
238.98 4,184,446.21 1.1% 2.45 osstream << int (is C locale)
9.60 104,189,551.72 2.3% 2.43 std::to_string(int)
7.76 128,815,417.84 0.4% 2.41 std::to_chars(int)
1,016.96 983,320.82 1.0% 2.42 osstream << double (not C locale)
1,026.09 974,573.86 0.7% 2.42 osstream << double (is C locale)
526.30 1,900,065.50 1.0% 2.45 std::to_string(double)
55.40 18,049,144.14 0.5% 2.42 std::to_chars(double)

MSVC STL/MSVC (19.28.29913 for x64)

cl locale-perf-test.cpp /Fe:locale-perf-test-msvc.exe /O2 /EHsc /std:c++17

ns/op op/s err% total benchmark
409.53 2,441,849.56 0.2% 2.42 [ref] osstream with filler string
722.66 1,383,768.22 0.3% 2.46 osstream << int (not C locale)
722.47 1,384,135.16 0.2% 2.42 osstream << int (is C locale)
9.91 100,885,320.90 0.6% 2.46 std::to_string(int)
9.35 106,907,753.86 0.3% 2.42 std::to_chars(int)
1,065.95 938,132.66 0.8% 2.42 osstream << double (not C locale)
1,065.88 938,188.30 0.8% 2.41 osstream << double (is C locale)
611.46 1,635,419.73 0.4% 2.42 std::to_string(double)
51.11 19,564,228.70 0.5% 2.42 std::to_chars(double)

(Edited to add double-to-string for comparison.)

mstorsjo commented 9 months ago

Just for future reference, @alvinhochun did post an initial patch to try to fix this, in https://reviews.llvm.org/D128600, by calling _vsnprintf_s_l in the mingw codepath, like this:

diff --git a/libcxx/src/support/win32/locale_win32.cpp b/libcxx/src/support/win32/locale_win32.cpp
index 57ef94932ba0..05b8ce2e1db3 100644
--- a/libcxx/src/support/win32/locale_win32.cpp
+++ b/libcxx/src/support/win32/locale_win32.cpp
@@ -90,11 +90,15 @@ int snprintf_l(char* ret, size_t n, locale_t loc, const char* format, ...) {
   int result = __stdio_common_vsprintf(
       _CRT_INTERNAL_LOCAL_PRINTF_OPTIONS | _CRT_INTERNAL_PRINTF_STANDARD_SNPRINTF_BEHAVIOR, ret, n, format, loc, ap);
 #else
-  __libcpp_locale_guard __current(loc);
-  _LIBCPP_DIAGNOSTIC_PUSH
-  _LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Wformat-nonliteral")
-  int result = vsnprintf(ret, n, format, ap);
-  _LIBCPP_DIAGNOSTIC_POP
+  int result;
+  if (n != 0) {
+    va_list ap_copy;
+    va_copy(ap_copy, ap);
+    result = _vsnprintf_s_l(ret, n, _TRUNCATE, format, loc, ap_copy);
+    va_end(ap_copy);
+  }
+  if (n == 0 || result < 0)
+    result = _vscprintf_l(format, loc, ap);
 #endif
   va_end(ap);
   return result;

This does build for both UCRT and msvcrt (_vsnprintf_s_l and _vscprintf_l do exist in msvcrt.dll, since Windows Vista), but formatting of long double fails. On x86 mingw, long double is the x87 80 bit floats, while MSVC environments (and msvcrt.dll and UCRT) have long double as 64 bit. Libcxx in the CI configuration is built with the define __USE_MINGW_ANSI_STDIO enabled, which redirects the common stdio functions to mingw reimplementations, which use the right expected form of long double.

To fix this, the mingw stdio function reimplementations, used with __USE_MINGW_ANSI_STDIO, would need to be extended with an implementation of _vsnprintf_s_l.

Alternatively, the codepath using _vsnprintf_s_l could be used conditionally, as long as sizeof(double) == sizeof(long double) (i.e. for mingw, on non-x86), or if __USE_MINGW_ANSI_STDIO isn't enabled. (llvm-mingw builds of libc++ aren't built with that enabled currently - so they don't format long doubles correctly in these cases anyway.)

(Alternatively, the whole formatting could be reimplemented in libc++, but that's probably a much larger undertaking.)