sdispater / pendulum

Python datetimes made easy
https://pendulum.eustace.io
MIT License
6.12k stars 372 forks source link

extreme slow down in `in_tz` after upgrading from 2.1.2 to 3.0.0 #818

Open adam006 opened 2 months ago

adam006 commented 2 months ago

Issue

Hello, we have recently upgraded to pendulum 3 from 2.1.2 and have noticed an extreme slow down in one of our processes that has to compute a lot of times (a few hundred thousand) in version 2 it would take about ~2 minutes after switching to the latest pendulum version I end up killing the process after about 15 minutes. Doing some investigation I can see it is happening within the in_tz call and when changing the code to use astimezone it drops down to about ~43 seconds. Also creating a new project and doing a simple pyinstrument test on the in_tz function with a single timezone shows the following speed change old - pendulum_2 1 2 new - pendulum_3 0. We have pendulum used a lot through the project and are concerned about this slow down within pendulum and do not want to have to rewrite the project to remove it, due to how highly integrated it is with pendulum. What can we do about fixing the speed issues here?

adam006 commented 2 months ago

I was able to improve this slightly by patching the timezone function and caching the Timezone objects but it is still very slow and demands _a _lot__ of cpu.

adam006 commented 2 months ago

more investigation shows that this call to convert on this line specifically with the dt.astimezone(self), which appears to be a builtin function, however like I stated above using the call to astimezone directly takes almost nothing. That makes this all the more perplexing.

ariebovenberg commented 2 months ago

This must indeed be an inadvertent regression—the pendulum FAQ shows pendulum's performance was way faster than arrow, but now it seems reversed:

...........
tz change (pendulum): Mean +- std dev: 58.0 us +- 0.6 us
...........
tz change (stdlib): Mean +- std dev: 233 ns +- 2 ns
...........
tz change (arrow): Mean +- std dev: 4.67 us +- 0.06 us

benchmark source:


import pyperf

runner = pyperf.Runner()
runner.timeit(
    "tz change (pendulum)",
    "dt.in_tz('America/New_York')",
    setup="from pendulum import datetime; dt = datetime(2020, 3, 20, 12, 30, 45, tz='Europe/Amsterdam')",
)
runner.timeit(
    "tz change (stdlib)",
    "dt.astimezone(target)",
    setup="from datetime import datetime; from zoneinfo import ZoneInfo; "
    "dt = datetime(2020, 3, 20, 12, 30, 45, tzinfo=ZoneInfo('Europe/Amsterdam')); "
    "target = ZoneInfo('America/New_York')",
)
runner.timeit(
    "tz change (arrow)",
    "dt.to('America/New_York')",
    setup="import arrow; dt = arrow.get(2020, 3, 20, 12, 30, 45, 0, tz='Europe/Amsterdam'); "
)