pennsignals / dsdk

Data Science Deploy Kit
MIT License
8 stars 7 forks source link

Add utc now times #50

Closed jlubken closed 4 years ago

jlubken commented 4 years ago

Add UTC now times:

  1. Update schema for recommendations in https://wiki.postgresql.org/wiki/Don't_Do_This

a. Revise use of datetime, remove epoch_ms:

Discussion:

PostgreSQL's timestamptz behaves very differently than python's datetime with time zone. PostgreSQL is unambiguous and does not combine data types with and without time zone. Python conflates a time zone offset (-4:00 or -5:00) with time zone 'America/New_York'. Offsets may indicate a single point in time accurately, but do not provide correct date arithmetic, because they do not store the location. They do not have knowledge of the rules for the dst changes, leap seconds, etc in effect at the location as provided by the Olson/IANA time zone database. Other "time zones" like 'EDT' and 'EST' are inappropriate because they are not unique across all locations.

Actions:

Consequences:

b. Replace PostgreSQL 10 sequences with PostgreSQl 11+ identity columns.

  1. Mitigate problems due to python datetime and timezone. Add functions for non-naive utc local datetime now, localization for local time. Add functions for epoch_ms.

  2. Mixins were not decoupled. Use open_batch, Batch and Run classes to composite information for mongo insert_doc and PostgreSQL sql_insert instead of breaking mixin encapsulation. Use this composition to store PostgreSQL key information in mongo without breaking encapsulation. This will make verification and testing much easier, and avoid error-prone joins on datetime approximations. Single clock source.

  3. Get the default AS_OF datetime and default TIME_ZONE from PostgreSQL if not provided. Use the database and the database system clock as the single source of truth. Ideally, we would push date and time calculations for cohort selection intervals into the database as well to avoid variance in client language implementations.

  4. Add a PostgreSQL domain type for time zone that validates time zone strings against the embedded Olson/IANA time zone database. This addition illustrates the difference between a data type (varchar) and a strong domain type (constraints, validation and rules).