PostgreSQL's timestamptz behaves very differently than python's datetime with time zone. PostgreSQL is unambiguous and does not combine data types with and without time zone. Python conflates a time zone offset (-4:00 or -5:00) with time zone 'America/New_York'. Offsets may indicate a single point in time accurately, but do not provide correct date arithmetic, because they do not store the location. They do not have knowledge of the rules for the dst changes, leap seconds, etc in effect at the location as provided by the Olson/IANA time zone database. Other "time zones" like 'EDT' and 'EST' are inappropriate because they are not unique across all locations.
Actions:
Use dateutil's parser and assert that parsed datetime inputs are in UTC.
Use python non-naive UTC datetime stored as a string in iso/PostgreSQL format as the AS_OF environment variable.
Use a valid local time zone 'America/New_York' for the TIME_ZONE environment variable.
Computation of the date or datetime range for cohort selection localizes AS_OF to non-naive TIME_ZONE and then may drop the time zone to accept data from naive data sources.
Consequences:
Repeatable data reads, and repeatable cohort test data sets, unless the upstream data has changed.
PostgreSQL datetime is easy to read unlike epoch_ms. PostgreSQL's timestamptz's storage implementation is epoch_ms.
b. Replace PostgreSQL 10 sequences with PostgreSQl 11+ identity columns.
Mitigate problems due to python datetime and timezone. Add functions for non-naive utc local datetime now, localization for local time. Add functions for epoch_ms.
Mixins were not decoupled. Use open_batch, Batch and Run classes to composite information for mongo insert_doc and PostgreSQL sql_insert instead of breaking mixin encapsulation. Use this composition to store PostgreSQL key information in mongo without breaking encapsulation. This will make verification and testing much easier, and avoid error-prone joins on datetime approximations. Single clock source.
Get the default AS_OF datetime and default TIME_ZONE from PostgreSQL if not provided. Use the database and the database system clock as the single source of truth. Ideally, we would push date and time calculations for cohort selection intervals into the database as well to avoid variance in client language implementations.
Add a PostgreSQL domain type for time zone that validates time zone strings against the embedded Olson/IANA time zone database. This addition illustrates the difference between a data type (varchar) and a strong domain type (constraints, validation and rules).
Add UTC now times:
a. Revise use of datetime, remove epoch_ms:
Discussion:
PostgreSQL's timestamptz behaves very differently than python's datetime with time zone. PostgreSQL is unambiguous and does not combine data types with and without time zone. Python conflates a time zone offset (-4:00 or -5:00) with time zone 'America/New_York'. Offsets may indicate a single point in time accurately, but do not provide correct date arithmetic, because they do not store the location. They do not have knowledge of the rules for the dst changes, leap seconds, etc in effect at the location as provided by the Olson/IANA time zone database. Other "time zones" like 'EDT' and 'EST' are inappropriate because they are not unique across all locations.
Actions:
Consequences:
b. Replace PostgreSQL 10 sequences with PostgreSQl 11+ identity columns.
Mitigate problems due to python datetime and timezone. Add functions for non-naive utc local datetime now, localization for local time. Add functions for epoch_ms.
Mixins were not decoupled. Use open_batch, Batch and Run classes to composite information for mongo insert_doc and PostgreSQL sql_insert instead of breaking mixin encapsulation. Use this composition to store PostgreSQL key information in mongo without breaking encapsulation. This will make verification and testing much easier, and avoid error-prone joins on datetime approximations. Single clock source.
Get the default AS_OF datetime and default TIME_ZONE from PostgreSQL if not provided. Use the database and the database system clock as the single source of truth. Ideally, we would push date and time calculations for cohort selection intervals into the database as well to avoid variance in client language implementations.
Add a PostgreSQL domain type for time zone that validates time zone strings against the embedded Olson/IANA time zone database. This addition illustrates the difference between a data type (varchar) and a strong domain type (constraints, validation and rules).