lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
456 stars 38 forks source link

Use timezone-aware datetimes everywhere in the public API #233

Closed lemon24 closed 3 years ago

lemon24 commented 3 years ago

It's the right thing: https://blog.ganssle.io/articles/2019/11/utcnow.html

Can only be done in 2.0.

lemon24 commented 3 years ago

This has more subtleties than you'd think.

sqlite3 support

sqlite3 (the Python module) doesn't support non-naive datetimes – they get written fine, but we get a ValueError: invalid literal for int() with base 10: b'00+00' when trying to get them out. This is a known bug; a fix was rejected because there's another issue to deprecate adapters; this being stdlib, it won't likely won't be.

We can either install the converter in the patch globally with sqlite3.register_converter (there's no per-db-connection registry), or do our own conversion.

SQLite support

There are two ways to store the timezone; either have a duplicate ("computed") column for the UTC equivalent and sort by that, or always sort by datetime(<some datetime column>) (which converts to UTC on the fly). The tradeoff is what you'd expect.

Storage

If we want to store the timezone, we have to implement the above, and we have to force all future storage implementation to do it too (supporting both versions makes things needlessly complicated).

Alternatively, we can leave all the internals use naive (assumed UTC) datetimes, and only convert them to datetimes with explicit (UTC-only) timezones in the high level API (so we're still doing the right thing for the high level / external users). This seems OK for now.

Frankly, I don't have a strong use case to store the original timezone (yeah, it would be nice, but YAGNI).

And only some feeds use a non-UTC timezone anyway; here are some counts for my database for updated or published:

Testing

The fake parser returns naive datetimes; we then compare them with those returned by Reader. This means a lot (read: all) of tests in test_reader_*.py will be broken by Reader returning aware datetimes.

Assuming we're going with the skin-deep timezone awareness, we'll have to find a solution that doesn't require fixing all of the tests.