ncats / CTSA-Metrics

17 stars 16 forks source link

InformaticsMetricACTSQLServer.sql - Datetime compared to Date strings - decreases results erroneously #33

Closed DarrenWHenderson closed 3 years ago

DarrenWHenderson commented 3 years ago

START_DATE in OBSERVATION_FACT is datetime (timestamp) and is it compared using the between operator in several places throughout the script to DATE literals. This will not implicitly cast START_DATE to date prior to the comparison. The code as written is missing all encounters beyond midnight on Dec 31 (12/31/2020 00:00:00). You can run the following two statements and see the counts increase by a small amount. It is minimal in our warehouse, but inaccurate nonetheless.

Demo: SELECT COUNT(DISTINCT OBS.ENCOUNTER_NUM) CNT FROM OBSERVATION_FACT OBS WHERE OBS.START_DATE BETWEEN '01-01-2016' AND '12-31-2020'

Counts increase when explicitly converting datetime to date prior to between: SELECT COUNT(DISTINCT OBS.ENCOUNTER_NUM) CNT FROM OBSERVATION_FACT OBS WHERE CONVERT(DATE,OBS.START_DATE) BETWEEN '01-01-2016' AND '12-31-2020'

Below are a few references:

https://github.com/ncats/CTSA-Metrics/blob/09773218e886c5fee05cefc2371c298d9e78f4be/InformaticsMetricACTSQLServer.sql#L78

https://github.com/ncats/CTSA-Metrics/blob/09773218e886c5fee05cefc2371c298d9e78f4be/InformaticsMetricACTSQLServer.sql#L82

https://github.com/ncats/CTSA-Metrics/blob/09773218e886c5fee05cefc2371c298d9e78f4be/InformaticsMetricACTSQLServer.sql#L94

Simple fix, find-replace OBS.START_DATE BETWEEN WITH CONVERT(DATE,OBS.START_DATE) BETWEEN