Documentation: a guide for new researchers/SQL-writers

I feel like right now there are some conventions/patterns used in our SQL and Python that new folks wouldn't know unless they go and read some source code.

I'm starting this issue as a dumping ground for stuff I encounter as I write new studies. Topics that might be nice to cover.

Date handling
- Use from_iso8601_timestamp to parse a string into a TIMESTAMP
- Use date_trunc('year', date(timestamp)) to pull out a part of that like year/month
Array handling
- Maybe folks won't need this if they build off the core tables?
- Mention of unnest, array_sort, and array_join as common tools for wrangling FHIR lists
Cubing
- Never pass null values to a cube - it uses null internally for the "ignoring this column" counts - instead coalesce values into a string like 'None'
Jinja basics, or at least a pointer to a good external guide
Performance
- Be cautious of over-use of array_agg, as it can be slow
- When doing multiple joins, order them by largest table to smallest table to improve speed
- Avoid GROUP BY when possible, and especially avoid multiple elements - use arbitrary() for fields that are the same across aggregation, rather than adding them to the group-by
- More suggestions can be found by reading Athena's performance docs

smart-on-fhir / cumulus-library

Documentation: a guide for new researchers/SQL-writers #147