opensafely-core / ehrql

ehrQL: the electronic health record query language for OpenSAFELY
https://docs.opensafely.org/ehrql/
Other
7 stars 3 forks source link

Get the ehrQL tutorial ready for user testing #1052

Closed StevenMaude closed 1 year ago

StevenMaude commented 1 year ago

Moved from opensafely/documentation#987.

StevenMaude commented 1 year ago

This is a tracking issue to cover which tasks are required for this.

@CarolineMorton is going to test ehrQL with users. We want the ehrQL tutorial and supporting material ready enough for someone new to:

  • learn how to run ehrQL dataset definitions on their own computer, ideally with OpenSAFELY CLI
  • learn sufficient ehrQL to write a dataset definition
  • write a dataset definition to meet some moderately complicated criteria, reflective of real-world use

Required

User testing dataset definition

  • [ ] Devise a complete dataset definition and prompts for user testing participants to follow and ideally be able to write a comparable definition themselves. (Caroline)
  • [ ] Does the user testing dataset definition require creation of some sample data or would we use the new data generation feature?

Codelists

  • [x] Add a placeholder for codelists for now.
  • [ ] Explain codelists in more detail once the dataset definition is finalised. (Caroline)

Data Builder use

  • [x] Correct tutorial examples to use from databuilder.ehrql import Dataset; https://github.com/opensafely/documentation/pull/988
  • [x] Use generate-dataset dataset_definition.py --dummy-tables path/to/csv
  • [x] Warn about the last set_population taking effect?
  • [x] Correct egg= install syntax (opensafely-core/databuilder#1051)
  • [x] Refer to population instead of cohort (mentioned by Caroline and in opensafely-core/databuilder#1051).
  • [x] https://github.com/opensafely-core/databuilder/issues/756 (not absolutely essential, we can install with pip, but that's not ideal especially given the install problems in Windows):arrow_left::hammer:
    • [x] https://github.com/opensafely-core/databuilder/pull/703 — get tutorial tables into a release version of Data Builder, or bundle them into the tutorial examples here.
    • [x] Make the examples work with the released Data Builder Docker image, based on its main branch.
    • [x] Restructure to include project.yaml per dataset definition.
    • [x] Consider building or testing the outputs of dataset definitions with the OpenSAFELY CLI, if that's how they are used? (Testing is probably better as while opensafely and Docker are dependencies that developers will typically have installed, they are extra dependencies.)
  • [ ] Sense check the data for it being consistent:
    • [x] for external consistency with Data Builder's tables; we should use comparable tables to real-world tables, if possible
    • [ ] for internal consistency, for example, a date for patient address shouldn't be before a patient's date of birth; rounded IMD should be rounded.
  • [ ] Consider breaking apart example data CSVs and avoid any reuse. (Ideally we should have entirely separate examples, so there's no confusion of slight deviation between what look like roughly the same example data CSVs.)
  • [ ] Consider having a single input data source

General tutorial tasks

  • [x] Rename variable to last_day_of_month_before_first_hospitalization in 4a (opensafely-core/databuilder#1051)
  • [ ] Switch to date_start in 3a2? (opensafely-core/databuilder#1051)
  • [ ] Consider better date arithmetic example in 4a (opensafely-core/databuilder#1051)
  • [ ] Use drop more sensibly in 4a (opensafely-core/databuilder#1051)

Review and polish

  • [x] Add a suggestion about having the code/data open in another tab/window.
  • [x] Fix couple of small code block formatting issues in 4a
  • [x] Use Dataset() at the start consistently
  • [ ] Use opensafely pull to update Data Builder image
  • [ ] Explain that the project.yaml has a specified vX.Y.Z version of the Data Builder image: what should we advise users to do? Stick with specifying vX.Y.Z or just vX?
  • [ ] Move explanation for multiple chained take()/drop() out of tutorial question into text for 6a.
  • [ ] https://github.com/opensafely-core/databuilder/issues/1050
  • [ ] Don't show tables that aren't used in dataset definitions; if we separate the data to be per example, then this is just removing the unused tables.
  • [ ] Use snippets to include parts of dataset definitions in the appropriate text location. Or maybe just re-include with highlighting?
  • [ ] Fix up internal links after page renames.
  • [ ] Improve tutorial naming, and filenames.
  • [ ] Improve explanation/consistency of frames and series.
  • [ ] Read through end-to-end: to clean up remnants, update outdated information, provide better linking…
  • [ ] Review rest of tutorial content.

Optional

StevenMaude commented 1 year ago

Closing this following the rewrite in #1317 as the associated content has been removed.