taichino / croniter

croniter is a python module to provide iteration for datetime object.
http://github.com/taichino/croniter
387 stars 105 forks source link

Smear dst transitions #99

Closed Chronial closed 5 years ago

Chronial commented 6 years ago

I think the best way to handle DST transition is to "smear" them. That means:

I think the most important aspect of DST shift handling is that tasks run as often as expected. So a task that is scheduled for a fixed time of day should run every day – independent of whether a DST shift happened at that day. A task that is scheduled to run once per hour should always run exactly 24 times a day.

An alternative to my suggestion is what cron is doing:

Local time changes of less than three hours, such as those caused by the Daylight Saving Time changes, are handled in a special way. This only applies to jobs that run at a specific time and jobs that run with a granularity greater than one hour. Jobs that run more frequently are scheduled normally.

If time was adjusted one hour forward, those jobs that would have run in the interval that has been skipped will be run immediately. Conversely, if time was adjusted backward, running the same job twice is avoided.

The smearing method has two advantages:

If the maintainers agree with this approach, I will prepare a PR. This should fix #90 and #91.

kiorky commented 6 years ago

You can try to make a PR yes, but DST is an evil beast that we (re)implement things even now and then.

Please add tests, and try to make existing still pass without mods.

Thx !

Chronial commented 6 years ago

Thanks for the quick response. Looking at this some more, I think this would only be possible by introducing a dependency on a timezone library, since the default tzinfo api does not provide enough information.

Such a dependency is probably not a good idea for this project. I think the solution of cron should be possible with the default tzinfo api.

kiorky commented 6 years ago

we already depend on pytz, is that not enought ?

kiorky commented 6 years ago

whoops, no, only python_dateutil. I would prefer to add pytz.

Chronial commented 6 years ago

With pytz, I should be able to make this work.

kiorky commented 6 years ago

@Chronial i think python_dateutil might be sufficient, see https://dateutil.readthedocs.io/en/stable/tz.html

Chronial commented 6 years ago

I need the actual list of dst transition points, but that should be available from python_datetutil, too. But I'm not sure if that makes sense with the current interface – I think it would be cleaner to pass the timezone name to croniter.

I will have to construct my own tzinfo anyways, since I need a timezones for the smear period.

Chronial commented 6 years ago

The implementation is done, but I'm not sure about how to do the API. The concept is this: Given a timezone name, I construct a converter (utilizing the dateutil db) that converts between local wall-time and utc time. This conversion smears DST transitions as described above.

Wrapped in these conversions, croniter can just operate naively without understanding of timezones, so it should support these two interfaces:

But that would not be backwards-compatible with the current interface. Alternatively, I could try to extract the timezone data from the tzinfo of the passed-in start time. But that would be a bit hacky, and might not always work, because the passed-in timezone might be anything and not come from dateutil.

What do you think?

kiorky commented 6 years ago

That we must find a non breaking way to do it :(, croniter begins to be heavily used.

Chronial commented 6 years ago

I could just add a new timezone parameter that activates timezone-aware mode, with the current dst handling staying active if its not passed. But that would leave duplicate dst code and be confusing to users. Also, the current DST handling is broken.

I implemented this for now in my project by wrapping croniter. I would gladly contribute to this project, but you will have to make an API decision.

kiorky commented 6 years ago

Well, the API does not have to be broken. Timezone can be optional and this only for naive datetim.

If the datetime we received is naive, and we dont give a timezone,. we consider it local. We then get the local tz and convert this input datetime to non-naive local timezoned datetime, and continue with that.

obsolete DST code that is better handled now by using non naive datetimes can be rewritten assuring that current tests pass (or are changed to fixed tests ;)).

Chronial commented 6 years ago

How should non-naive datetimes without a given timezone be handled?

Chronial commented 6 years ago

Just to clarify, I mean this call: croniter(crontab, datetime.datetime(2000, 1, 1, tzinfo=some_timezone)

kiorky commented 6 years ago

We should extract the timezone from the datetime instance.

Chronial commented 6 years ago

Ah, that's the problem. This is not always possible. In order to smear DST, I need the DST transition points, but there is no standardized API for that. I can special-case pytz and dateutil timezones, but this won't work for any other timezones.

kiorky commented 5 years ago

no activity for a long time.

lowell80 commented 4 years ago

@Chronial, is your work captured anywhere that's accessible to the public? I'm also quite curious about resolving some timezone/DST transition issues and I'd like to see how far you were able to go with this approach. You mentioned that you had a project that wrapped croniter, is that available?

Chronial commented 4 years ago

Yes, that is on github. See https://github.com/djangsters/redis-tasks/blob/master/redis_tasks/smear_dst.py and the way it is used: https://github.com/djangsters/redis-tasks/blob/e5767ad5c391daba4593404f6ddae788ef9d0ba5/redis_tasks/scheduler.py#L29-L32

It's been a while since I wrote that, but I think that should implement exactly what I explained in the first post.