opensafely-core / ehrql

ehrQL: the electronic health record query language for OpenSAFELY
https://docs.opensafely.org/ehrql/
Other
6 stars 3 forks source link

Materialise calculated dates used repeatedly in a dataset definition #2024

Open evansd opened 3 months ago

evansd commented 3 months ago

A common pattern in dataset definitions is to determine an index date, which varies for each patient , and then use it to restrict the time period for other queries in the dataset e.g.

index_date = maximum_of(study_end_date, date_of_death, intervention_date)
...
where(events.date.is_on_or_between(index_date - years(1), index_date)

Because of the fiddliness involved in date arithmetic, and in calculating row-wise maxima, the SQL expression corresponding to is_on_or_between(index_date - years(1), index_date) can often be quite complex. And they are repeated over and over again for each query that uses them.

It would be good to spot this pattern, calculate the pair of dates once for each patient, write it to a temporary table and then just re-use it.