microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.2k stars 1.51k forks source link

<chrono>: Major performance issues when using zoned_time or time_zone #2842

Open doomlaur opened 2 years ago

doomlaur commented 2 years ago

Background information For a high-performance software we had 2 important requirements:

  1. Add support for changing the time zone instead of using the local time zone (at least on Windows 10 >= 1903/19H1 and Windows Server >= 2022).
  2. Add support for dates before 1970 (the old C functions are limited to dates between 1970-01-01 00:00:00 UTC and 3000-12-31 23:59:59 UTC).

This software previously used functions like std::localtime and std::mktime that only supported the local time zone and no dates before 1970 or after 3000.

After changing from time_t to std::chrono::zoned_time and using std::chrono::time_zone (with some workarounds for Windows 10 < 1903/19H1 and Windows Server < 2022), as well as using the other C++20 chrono features, we were able to use multiple time zones, dates before 1970 and after 3000 and were even able to simplify and improve a lot of our code due to the super nice C++20 chrono features. Thank you for implementing that, it's amazing! :)

Describe the bug After the change we started having some major performance issues. In short, the performance issues happen whenever calls to ICU.dll must be made in order to find something time zone related, such as a time zone itself or a UTC offset. Because this is something that must be done very often, it causes major performance issues.

To be specific, all calls to std_tzdb_get_sys_info() and std_tzdb_get_current_zone() require finding the time zone in the ICU library which seems to be very slow. To be precise, in extreme cases, __std_tzdb_get_sys_info() might use up to 98% (!) of the time of the whole CPU profiler in Visual Studio when benchmarking our software! We were able to write a wrapper and cache the current zone, but the sys_info is needed very often and is not as easy to cache, as the information in the sys_info (such as UTC offset) depends on the timestamp.

Test case I created a new Visual Studio project, set the C++ version to C++20, added _CRT_SECURE_NO_WARNINGS (for simplicity) and ran a benchmark using the following code (in Release mode without attached debugger):

#include <chrono>
#include <iostream>

int main()
{
    constexpr size_t num_benchmark_iterations = 5'000'000;
    const auto now_utc = std::chrono::system_clock::now();
    const auto now_utc_in_seconds = std::chrono::system_clock::to_time_t(now_utc);
    const std::chrono::time_zone* current_zone = std::chrono::current_zone();

    // STEP 1 - try converting using the old C functions.
    const auto begin_timer_localtime = std::chrono::high_resolution_clock::now();
    for (size_t n = 0; n < num_benchmark_iterations; ++n)
    {
        /*tm* ctime =*/ std::localtime(&now_utc_in_seconds);
    }
    const auto elapsed_time_localtime = std::chrono::high_resolution_clock::now() - begin_timer_localtime;
    const auto elapsed_time_localtime_in_seconds = std::chrono::duration_cast<std::chrono::seconds>(elapsed_time_localtime);
    const auto elapsed_time_localtime_remaining_nanoseconds = elapsed_time_localtime - elapsed_time_localtime_in_seconds;
    std::cout << "Elapsed time for localtime conversion:\t" << std::format("{} {} ({})", elapsed_time_localtime_in_seconds, elapsed_time_localtime_remaining_nanoseconds, elapsed_time_localtime) << std::endl;

    // STEP 2 - try converting using the modern C++ functions.
    const auto begin_timer_time_zone = std::chrono::high_resolution_clock::now();
    for (size_t n = 0; n < num_benchmark_iterations; ++n)
    {
        /*auto now_local =*/ current_zone->to_local(now_utc);
    }
    const auto elapsed_time_time_zone = std::chrono::high_resolution_clock::now() - begin_timer_time_zone;
    const auto elapsed_time_time_zone_in_seconds = std::chrono::duration_cast<std::chrono::seconds>(elapsed_time_time_zone);
    const auto elapsed_time_time_zone_remaining_nanoseconds = elapsed_time_time_zone - elapsed_time_time_zone_in_seconds;
    std::cout << "Elapsed time for time zone conversion:\t" << std::format("{} {} ({})", elapsed_time_time_zone_in_seconds, elapsed_time_time_zone_remaining_nanoseconds, elapsed_time_time_zone) << std::endl;
}

Results on my computer:

Elapsed time for localtime conversion:  0s 382884200ns (382884200ns)
Elapsed time for time zone conversion:  47s 964314100ns (47964314100ns)

Expected behavior The search for time zones and conversions between sys_time and local_time should be (ideally) as fast or minimally slower than std::localtime and std::mktime.

Workarounds

  1. We wrote wrappers around std::chrono::current_zone() and std::chrono::default_zone(), because for our use case it was enough to query them once at startup and reuse them. (For our use case we don't have to consider the user changing the time zone in Windows during our software's runtime.)

  2. Call the following functions as infrequently as possible, only when absolutely necesssary:

    • default constructing std::chrono::zoned_time (calls std::chrono::default_zone() each time)
    • constructing std::chrono::zoned_time using a string_view for the time zone name
    • constructing std::chrono::zoned_time using an std::chrono::local_time
    • std::chrono::zoned_time::get_local_time()
    • std::chrono::locate_zone()
    • std::chrono::time_zone::get_info()
    • std::chrono::time_zone::to_local()
    • std::chrono::time_zone::to_sys()
  3. Workaround for std::chrono::zoned_time::get_local_time(): we were able to speed things up by using std::localtime and std::mktime instead of std::chrono::time_zone whenever the local time zone is enough, so no other time zones are needed. This is something required for old versions of Windows (Windows 10 < 1903/19H1 and Windows Server < 2022), but can also be used on newer versions of Windows to speed things up. The disadvantage of this solution is that we can't use dates before 1970, so we have to choose between that and performance.

  4. The last possible workaround would be to either cache the local_time or the sys_info separately for each timestamp. However, when working with lots of data, this means using additional memory (which might not always be desired), and nevertheless the local_times/sys_info still have to be queried at least once, which means that the performance issues still appear, just not as often.

Currently, std::chrono::zoned_time::get_local_time() seems to be the issue that affects us the most, although the same issue appears in all the cases listed above.

STL version

Microsoft Visual Studio Enterprise 2022 (64-bit) - Current
Version 17.2.5
reder2000 commented 2 years ago

Another issue is the creation of vector of zoned_time. The following is very expensive since it performs a search for each ("default") element:

    template <>
    struct zoned_traits<const time_zone*> {
        _NODISCARD static const time_zone* default_zone() {
          return _CHRONO get_tzdb().locate_zone("UTC");
        }

Something like

    template <>
    struct zoned_traits<const time_zone*> {
        _NODISCARD static const time_zone* default_zone() {
            static auto utc_zone = _CHRONO get_tzdb().locate_zone("UTC");
            return utc_zone;
        }

improves drastically the creation of vectors (like x15 on debug, x3 on release).

doomlaur commented 2 years ago

Thanks @reder2000, good point 👍 I also corrected my first post, because default-constructing zoned_time actually calls default_zone instead of current_zone.

Your example is actually exactly what our wrappers around current_zone and default_zone do: we use static variables to cache calls to the expensive functions wherever possible. Then we additionally wrap zoned_time for 2 purposes:

  1. Call our cached functions instead of the Standard Library functions to speed up all operations wherever possible.
  2. Provide exception handling and workarounds (for things like get_local_time, see above) on Windows versions older than Windows 10 1903/19H1 and older than Windows Server 2022.

Extending std::chrono::zoned_time to make all of this work is not an ideal solution at all, but it's the solution that worked for now, both for performance and backward compatibility 😄

reder2000 commented 2 years ago

Follow-up regarding performance. Parsing a CSV file with one TZ column (and 9 millions rows) results in really bad results. It takes about 13 seconds, And I'm using a custom parser to decode the datetime and zone, using the "recommended" functions takes infinite time. Using date (Hinnant) instead halves the time to 6.5 seconds. And just for the record, mingw64 takes 4 seconds, and gcc/wsl 3.3. I think I'll stick to using date, even though it's not practical (the vcpkg port is somewhat broken for c++ above 11, but works for other platforms)

doomlaur commented 1 year ago

I added some details in issue #1911 about how our extended zoned_time works, if anyone is interested.

oliverportcnc commented 5 months ago

Definitely a major blocker for our code as well. Seeing the same as doomlaur in CSVs with tons of date/time fields; about 98% of profiler time is spent in __std_tzdb_get_sys_info().

doomlaur commented 5 months ago

Now that GCC 14 has full support for std::chrono, here are some interesting results from a Linux computer running Fedora 40 with GCC/g++ 14.1.1 (compiled with -O3):

Elapsed time for localtime conversion:  10s 771523326ns (10771523326ns)
Elapsed time for time zone conversion:  0s 95471986ns (95471986ns)

I was positively surprised that the conversion using time zones is super fast! However, I was also surprised that localtime was so slow. Since I know that localtime_s is minimally faster than localtime on Windows, I instead tried localtime_r on Linux:

std::tm ctime {};

// STEP 1 - try converting using the old C functions.
const auto begin_timer_localtime = std::chrono::high_resolution_clock::now();
for (size_t n = 0; n < num_benchmark_iterations; ++n)
{
    localtime_r(&now_utc_in_seconds, &ctime);
}

I then got the following results:

Elapsed time for localtime conversion:  0s 191415802ns (191415802ns)
Elapsed time for time zone conversion:  0s 95490083ns (95490083ns)

Interestingly, time zone conversions are still faster on Linux than localtime_r, but there's not much difference anymore.