This is a tracking issue to cover which tasks are required for this.
@CarolineMorton is going to test ehrQL with users. We want the ehrQL tutorial and supporting material ready enough for someone new to:
learn how to run ehrQL dataset definitions on their own computer, ideally with OpenSAFELY CLI
learn sufficient ehrQL to write a dataset definition
write a dataset definition to meet some moderately complicated criteria, reflective of real-world use
Required
User testing dataset definition
[ ] Devise a complete dataset definition and prompts for user testing participants to follow and ideally be able to write a comparable definition themselves. (Caroline)
[ ] Does the user testing dataset definition require creation of some sample data or would we use the new data generation feature?
Codelists
[x] Add a placeholder for codelists for now.
[ ] Explain codelists in more detail once the dataset definition is finalised. (Caroline)
[x] Make the examples work with the released Data Builder Docker image, based on its main branch.
[x] Restructure to include project.yaml per dataset definition.
[x] Consider building or testing the outputs of dataset definitions with the OpenSAFELY CLI, if that's how they are used? (Testing is probably better as while opensafely and Docker are dependencies that developers will typically have installed, they are extra dependencies.)
[ ] Sense check the data for it being consistent:
[x] for external consistency with Data Builder's tables; we should use comparable tables to real-world tables, if possible
[ ] for internal consistency, for example, a date for patient address shouldn't be before a patient's date of birth; rounded IMD should be rounded.
[ ] Consider breaking apart example data CSVs and avoid any reuse. (Ideally we should have entirely separate examples, so there's no confusion of slight deviation between what look like roughly the same example data CSVs.)
[ ] Consider having a single input data source
General tutorial tasks
[x] Rename variable to last_day_of_month_before_first_hospitalization in 4a (opensafely-core/databuilder#1051)
[ ] Switch to date_start in 3a2? (opensafely-core/databuilder#1051)
[ ] Consider better date arithmetic example in 4a (opensafely-core/databuilder#1051)
[ ] Use drop more sensibly in 4a (opensafely-core/databuilder#1051)
Review and polish
[x] Add a suggestion about having the code/data open in another tab/window.
[x] Fix couple of small code block formatting issues in 4a
[x] Use Dataset() at the start consistently
[ ] Use opensafely pull to update Data Builder image
[ ] Explain that the project.yaml has a specified vX.Y.Z version of the Data Builder image: what should we advise users to do? Stick with specifying vX.Y.Z or just vX?
[ ] Move explanation for multiple chained take()/drop() out of tutorial question into text for 6a.
[ ] Don't show tables that aren't used in dataset definitions; if we separate the data to be per example, then this is just removing the unused tables.
[ ] Use snippets to include parts of dataset definitions in the appropriate text location. Or maybe just re-include with highlighting?
[ ] Fix up internal links after page renames.
[ ] Improve tutorial naming, and filenames.
[ ] Improve explanation/consistency of frames and series.
[ ] Read through end-to-end: to clean up remnants, update outdated information, provide better linking…
[ ] Review rest of tutorial content.
Optional
[ ] Consider templating in the Data Builder version number; see:
[ ] Move away from opensafely run earlier in the tutorial.
[ ] Divide dataset definition into well-defined sections, something like (borrowed from Milan): prepare the data :arrow_right: extract demographic data :arrow_right: extract events of interest :arrow_right: restrict dataset population.
[x] Use collapsible blocks for example data and dataset definitions to help reduce the vertical space of a tutorial.
[ ] Automate the update of the latest major version of Data Builder in the project.yaml and the requirements.
This is a tracking issue to cover which tasks are required for this.
@CarolineMorton is going to test ehrQL with users. We want the ehrQL tutorial and supporting material ready enough for someone new to:
Required
User testing dataset definition
Codelists
Data Builder use
from databuilder.ehrql import Dataset
; https://github.com/opensafely/documentation/pull/988generate-dataset dataset_definition.py --dummy-tables path/to/csv
set_population
taking effect?egg=
install syntax (opensafely-core/databuilder#1051)population
instead ofcohort
(mentioned by Caroline and in opensafely-core/databuilder#1051).pip
, but that's not ideal especially given the install problems in Windows):arrow_left::hammer:main
branch.project.yaml
per dataset definition.opensafely
and Docker are dependencies that developers will typically have installed, they are extra dependencies.)General tutorial tasks
last_day_of_month_before_first_hospitalization
in4a
(opensafely-core/databuilder#1051)date_start
in3a2
? (opensafely-core/databuilder#1051)4a
(opensafely-core/databuilder#1051)drop
more sensibly in4a
(opensafely-core/databuilder#1051)Review and polish
4a
Dataset()
at the start consistentlyopensafely pull
to update Data Builder imageproject.yaml
has a specifiedvX.Y.Z
version of the Data Builder image: what should we advise users to do? Stick with specifyingvX.Y.Z
or justvX
?take()
/drop()
out of tutorial question into text for6a
.Optional
opensafely run
earlier in the tutorial.project.yaml
and the requirements.import
statements in the sample data; should we update to match what's currently in ehrQL for a real backend?