sdispater / pendulum

Python datetimes made easy
https://pendulum.eustace.io
MIT License
6.12k stars 372 forks source link

Regression - Inconsistent DBAPI serialization of pendulum datetimes #798

Open rishi-kulkarni opened 5 months ago

rishi-kulkarni commented 5 months ago

Issue

My team uses Airflow's SQLiteHook to store some data on disk between tasks. We timestamp the column by creating some pandas DataFrame and inserting into the SQLite file with the hook.

    data_interval_end = pendulum.now()
    df = df.rename(columns=str.lower).astype({"user_id": int}).assign(date_created=data_interval_end)
    sqlite_hook.insert_rows(
        failure_event_sqlite_table,
        list(df[["user_id", "shift_id", "successes", "failures", "date_created"]].itertuples(index=False)),
    )

This has (and still does) make rows that look like this:

   user_id  shift_id  successes  failures               date_created
0        1        11        1.0       0.1  2023-09-25T00:00:00+00:00

In later tasks, we may pull out some data with a query that looks like this:

sqlite_hook.get_pandas_df(
        f"select * from {failure_event_sqlite_table} where date_created = ?", parameters=(data_interval_end,)
    )

This used to generate a query with a timestamp in the same format, but now it generates a query with a timestamp that looks like this (note the missing T)

where date_created = '2023-07-04 18:00:00+00:00'

This is inconsistent behavior breaks all of these queries - the pendulum timestamps should serialize and deserialize in the same format.

What you think should happen instead?

The previous behavior was to make statements like this:

where date_created = '2023-07-04T18:00:00+00:00'

But it's fine either way, as long as it's consistent.