I feel like right now there are some conventions/patterns used in our SQL and Python that new folks wouldn't know unless they go and read some source code.
I'm starting this issue as a dumping ground for stuff I encounter as I write new studies. Topics that might be nice to cover.
Date handling
Use from_iso8601_timestamp to parse a string into a TIMESTAMP
Use date_trunc('year', date(timestamp)) to pull out a part of that like year/month
Array handling
Maybe folks won't need this if they build off the core tables?
Mention of unnest, array_sort, and array_join as common tools for wrangling FHIR lists
Cubing
Never pass null values to a cube - it uses null internally for the "ignoring this column" counts - instead coalesce values into a string like 'None'
Jinja basics, or at least a pointer to a good external guide
Performance
Be cautious of over-use of array_agg, as it can be slow
When doing multiple joins, order them by largest table to smallest table to improve speed
Avoid GROUP BY when possible, and especially avoid multiple elements - use arbitrary() for fields that are the same across aggregation, rather than adding them to the group-by
I feel like right now there are some conventions/patterns used in our SQL and Python that new folks wouldn't know unless they go and read some source code.
I'm starting this issue as a dumping ground for stuff I encounter as I write new studies. Topics that might be nice to cover.
Date handling
from_iso8601_timestamp
to parse a string into aTIMESTAMP
date_trunc('year', date(timestamp))
to pull out a part of that like year/monthArray handling
unnest
,array_sort
, andarray_join
as common tools for wrangling FHIR listsCubing
null
values to a cube - it usesnull
internally for the "ignoring this column" counts - instead coalesce values into a string like'None'
Jinja basics, or at least a pointer to a good external guide
Performance
array_agg
, as it can be slowarbitrary()
for fields that are the same across aggregation, rather than adding them to the group-by