stub42 / pytz

pytz Python historical timezone library and database
MIT License
340 stars 92 forks source link

API to preload all data files #42

Closed besfahbod closed 3 years ago

besfahbod commented 4 years ago

In a dynamic deployment environment, sometimes it's needed to "warm" the library by preloading all the data files before the application starts. Would be great if the library had a single entry point for that, to have a shared, stable way of doing so.

pganssle commented 3 years ago

@besfahbod How about this?

import pytz

def warm_pytz_cache() -> None:
    for zone in pytz.all_timezones:
        pytz.timezone(zone)

I'll note that if you migrate to zoneinfo, the guarantees about the caching would require you to hold an actual reference to the objects you want to cache if you don't want to have a cache miss:

import zoneinfo

def warm_cache(*, _cache: dict[str: zoneinfo.ZoneInfo] = {}) -> None:
    for zone in zoneinfo.available_timezones():
        _cache[zone] = zoneinfo.ZoneInfo(zone)

That is guaranteed to work as part of the public interface by the PEP. I am not sure that pytz makes any guarantees about its caching behavior, but my understanding is that development on pytz is largely frozen, so I doubt they'll be re-working the cache system any time soon.

besfahbod commented 3 years ago

Thanks, @pganssle. The pytz solution makes sense, and is in fact similar to what I have put in place in the application code right now. And, from observation, looks like the caching behavior is that it doesn't get cleared automatically, fwiw.

And thanks for the extra notes about the zoneinfo caching behavior. Good to know when we migrate to zoneinfo module.

Considering these, @pganssle, do you think it makes sense to add a specific API for these to pytz (and probably suggest the same for zoneinfo)?

pganssle commented 3 years ago

Considering these, @pganssle, do you think it makes sense to add a specific API for these to pytz (and probably suggest the same for zoneinfo)?

Can't speak for pytz, but definitely not for zoneinfo. The caching behavior is well-defined and I don't see a particularly strong justification for this, when it's pretty simple to write your own cache warmer if you really need it.

stub42 commented 3 years ago

pytz won't throw away the timezone info once it is loaded, and @pganssle's warm_pytz_cache() method will work fine for most purposes. The only complication is when the database on disk has drifted, which could happen when pytz has been setup to use the system zoneinfo database (but unlikely; it requires a timezone to be completely removed from the database. To make this extremely unlikely, just warm common_timezones rather than all_timezones, so you don't try warming all the deprecated zones).

def warm_pytz_cache() -> None:
    for zone in pytz.common_timezones:
        try:
            pytz.timezone(zone)
        except Exception:
            pass

Consider the above approach blessed, and I'll try to keep it working when transitioning to Python3.9's internals. I don't want to add a call to pytz to do this. I think it is an infrequent enough usage that use cases will be subtly different (warm all_timezones vs common_timezones for instance).