Closed StevenMaude closed 1 year ago
This is a tracking issue to cover which tasks are required for this.
@CarolineMorton is going to test ehrQL with users. We want the ehrQL tutorial and supporting material ready enough for someone new to:
- learn how to run ehrQL dataset definitions on their own computer, ideally with OpenSAFELY CLI
- learn sufficient ehrQL to write a dataset definition
- write a dataset definition to meet some moderately complicated criteria, reflective of real-world use
Required
User testing dataset definition
- [ ] Devise a complete dataset definition and prompts for user testing participants to follow and ideally be able to write a comparable definition themselves. (Caroline)
- [ ] Does the user testing dataset definition require creation of some sample data or would we use the new data generation feature?
Codelists
- [x] Add a placeholder for codelists for now.
- [ ] Explain codelists in more detail once the dataset definition is finalised. (Caroline)
Data Builder use
- [x] Correct tutorial examples to use
from databuilder.ehrql import Dataset
; https://github.com/opensafely/documentation/pull/988- [x] Use
generate-dataset dataset_definition.py --dummy-tables path/to/csv
- [x] Warn about the last
set_population
taking effect?- [x] Correct
egg=
install syntax (opensafely-core/databuilder#1051)- [x] Refer to
population
instead ofcohort
(mentioned by Caroline and in opensafely-core/databuilder#1051).- [x] https://github.com/opensafely-core/databuilder/issues/756 (not absolutely essential, we can install with
pip
, but that's not ideal especially given the install problems in Windows):arrow_left::hammer:
- [x] https://github.com/opensafely-core/databuilder/pull/703 — get tutorial tables into a release version of Data Builder, or bundle them into the tutorial examples here.
- [x] Make the examples work with the released Data Builder Docker image, based on its
main
branch.- [x] Restructure to include
project.yaml
per dataset definition.- [x] Consider building or testing the outputs of dataset definitions with the OpenSAFELY CLI, if that's how they are used? (Testing is probably better as while
opensafely
and Docker are dependencies that developers will typically have installed, they are extra dependencies.)- [ ] Sense check the data for it being consistent:
- [x] for external consistency with Data Builder's tables; we should use comparable tables to real-world tables, if possible
- [ ] for internal consistency, for example, a date for patient address shouldn't be before a patient's date of birth; rounded IMD should be rounded.
- [ ] Consider breaking apart example data CSVs and avoid any reuse. (Ideally we should have entirely separate examples, so there's no confusion of slight deviation between what look like roughly the same example data CSVs.)
- [ ] Consider having a single input data source
General tutorial tasks
- [x] Rename variable to
last_day_of_month_before_first_hospitalization
in4a
(opensafely-core/databuilder#1051)- [ ] Switch to
date_start
in3a2
? (opensafely-core/databuilder#1051)- [ ] Consider better date arithmetic example in
4a
(opensafely-core/databuilder#1051)- [ ] Use
drop
more sensibly in4a
(opensafely-core/databuilder#1051)Review and polish
- [x] Add a suggestion about having the code/data open in another tab/window.
- [x] Fix couple of small code block formatting issues in
4a
- [x] Use
Dataset()
at the start consistently- [ ] Use
opensafely pull
to update Data Builder image- [ ] Explain that the
project.yaml
has a specifiedvX.Y.Z
version of the Data Builder image: what should we advise users to do? Stick with specifyingvX.Y.Z
or justvX
?- [ ] Move explanation for multiple chained
take()
/drop()
out of tutorial question into text for6a
.- [ ] https://github.com/opensafely-core/databuilder/issues/1050
- [ ] Don't show tables that aren't used in dataset definitions; if we separate the data to be per example, then this is just removing the unused tables.
- [ ] Use snippets to include parts of dataset definitions in the appropriate text location. Or maybe just re-include with highlighting?
- [ ] Fix up internal links after page renames.
- [ ] Improve tutorial naming, and filenames.
- [ ] Improve explanation/consistency of frames and series.
- [ ] Read through end-to-end: to clean up remnants, update outdated information, provide better linking…
- [ ] Review rest of tutorial content.
Optional
- [ ] Consider templating in the Data Builder version number; see:
- [ ] Move away from
opensafely run
earlier in the tutorial.- [ ] Divide dataset definition into well-defined sections, something like (borrowed from Milan): prepare the data :arrow_right: extract demographic data :arrow_right: extract events of interest :arrow_right: restrict dataset population.
- [x] Use collapsible blocks for example data and dataset definitions to help reduce the vertical space of a tutorial.
- [ ] Automate the update of the latest major version of Data Builder in the
project.yaml
and the requirements.- [ ] https://github.com/opensafely-core/databuilder/issues/1047
- [ ] Move Python and other more detailed explanations out to their own pages.
- [ ] Review use of admonitions.
- [ ] Sense check the dataset definitions for being meaningful and relevant to what a researcher might want to do.
- [ ] Finalise the tables, columns and
import
statements in the sample data; should we update to match what's currently in ehrQL for a real backend?- [x] https://github.com/opensafely/documentation/issues/777
- [ ] https://github.com/opensafely-core/databuilder/issues/1030
- [ ] https://github.com/opensafely/documentation/issues/743
- [ ] Consider having one example data directory per example; that way no-one changes some data and changes the intent of an unrelated tutorial.
Closing this following the rewrite in #1317 as the associated content has been removed.
Moved from opensafely/documentation#987.