Open doomlaur opened 2 years ago
Another issue is the creation of vector of zoned_time. The following is very expensive since it performs a search for each ("default") element:
template <>
struct zoned_traits<const time_zone*> {
_NODISCARD static const time_zone* default_zone() {
return _CHRONO get_tzdb().locate_zone("UTC");
}
Something like
template <>
struct zoned_traits<const time_zone*> {
_NODISCARD static const time_zone* default_zone() {
static auto utc_zone = _CHRONO get_tzdb().locate_zone("UTC");
return utc_zone;
}
improves drastically the creation of vectors (like x15 on debug, x3 on release).
Thanks @reder2000, good point 👍 I also corrected my first post, because default-constructing zoned_time actually calls default_zone instead of current_zone.
Your example is actually exactly what our wrappers around current_zone and default_zone do: we use static variables to cache calls to the expensive functions wherever possible. Then we additionally wrap zoned_time for 2 purposes:
Extending std::chrono::zoned_time to make all of this work is not an ideal solution at all, but it's the solution that worked for now, both for performance and backward compatibility 😄
Follow-up regarding performance. Parsing a CSV file with one TZ column (and 9 millions rows) results in really bad results. It takes about 13 seconds, And I'm using a custom parser to decode the datetime and zone, using the "recommended" functions takes infinite time. Using date (Hinnant) instead halves the time to 6.5 seconds. And just for the record, mingw64 takes 4 seconds, and gcc/wsl 3.3. I think I'll stick to using date, even though it's not practical (the vcpkg port is somewhat broken for c++ above 11, but works for other platforms)
I added some details in issue #1911 about how our extended zoned_time works, if anyone is interested.
Definitely a major blocker for our code as well. Seeing the same as doomlaur in CSVs with tons of date/time fields; about 98% of profiler time is spent in __std_tzdb_get_sys_info().
Now that GCC 14 has full support for std::chrono
, here are some interesting results from a Linux computer running Fedora 40 with GCC/g++ 14.1.1 (compiled with -O3
):
Elapsed time for localtime conversion: 10s 771523326ns (10771523326ns)
Elapsed time for time zone conversion: 0s 95471986ns (95471986ns)
I was positively surprised that the conversion using time zones is super fast! However, I was also surprised that localtime was so slow. Since I know that localtime_s
is minimally faster than localtime
on Windows, I instead tried localtime_r
on Linux:
std::tm ctime {};
// STEP 1 - try converting using the old C functions.
const auto begin_timer_localtime = std::chrono::high_resolution_clock::now();
for (size_t n = 0; n < num_benchmark_iterations; ++n)
{
localtime_r(&now_utc_in_seconds, &ctime);
}
I then got the following results:
Elapsed time for localtime conversion: 0s 191415802ns (191415802ns)
Elapsed time for time zone conversion: 0s 95490083ns (95490083ns)
Interestingly, time zone conversions are still faster on Linux than localtime_r
, but there's not much difference anymore.
Background information For a high-performance software we had 2 important requirements:
This software previously used functions like std::localtime and std::mktime that only supported the local time zone and no dates before 1970 or after 3000.
After changing from time_t to std::chrono::zoned_time and using std::chrono::time_zone (with some workarounds for Windows 10 < 1903/19H1 and Windows Server < 2022), as well as using the other C++20 chrono features, we were able to use multiple time zones, dates before 1970 and after 3000 and were even able to simplify and improve a lot of our code due to the super nice C++20 chrono features. Thank you for implementing that, it's amazing! :)
Describe the bug After the change we started having some major performance issues. In short, the performance issues happen whenever calls to ICU.dll must be made in order to find something time zone related, such as a time zone itself or a UTC offset. Because this is something that must be done very often, it causes major performance issues.
To be specific, all calls to std_tzdb_get_sys_info() and std_tzdb_get_current_zone() require finding the time zone in the ICU library which seems to be very slow. To be precise, in extreme cases, __std_tzdb_get_sys_info() might use up to 98% (!) of the time of the whole CPU profiler in Visual Studio when benchmarking our software! We were able to write a wrapper and cache the current zone, but the sys_info is needed very often and is not as easy to cache, as the information in the sys_info (such as UTC offset) depends on the timestamp.
Test case I created a new Visual Studio project, set the C++ version to C++20, added _CRT_SECURE_NO_WARNINGS (for simplicity) and ran a benchmark using the following code (in Release mode without attached debugger):
Results on my computer:
Expected behavior The search for time zones and conversions between sys_time and local_time should be (ideally) as fast or minimally slower than std::localtime and std::mktime.
Workarounds
We wrote wrappers around std::chrono::current_zone() and std::chrono::default_zone(), because for our use case it was enough to query them once at startup and reuse them. (For our use case we don't have to consider the user changing the time zone in Windows during our software's runtime.)
Call the following functions as infrequently as possible, only when absolutely necesssary:
Workaround for std::chrono::zoned_time::get_local_time(): we were able to speed things up by using std::localtime and std::mktime instead of std::chrono::time_zone whenever the local time zone is enough, so no other time zones are needed. This is something required for old versions of Windows (Windows 10 < 1903/19H1 and Windows Server < 2022), but can also be used on newer versions of Windows to speed things up. The disadvantage of this solution is that we can't use dates before 1970, so we have to choose between that and performance.
The last possible workaround would be to either cache the local_time or the sys_info separately for each timestamp. However, when working with lots of data, this means using additional memory (which might not always be desired), and nevertheless the local_times/sys_info still have to be queried at least once, which means that the performance issues still appear, just not as often.
Currently, std::chrono::zoned_time::get_local_time() seems to be the issue that affects us the most, although the same issue appears in all the cases listed above.
STL version