opensafely-core / ehrql

ehrQL: the electronic health record query language for OpenSAFELY
https://docs.opensafely.org/ehrql/
Other
7 stars 3 forks source link

Draw the owl #1633

Closed iaindillingham closed 12 months ago

iaindillingham commented 1 year ago

As @inglesp has pointed out, the ehrQL tutorial is similar to How to draw an owl.

How to draw an owl

Upon conclusion of the ehrQL tutorial, the reader has created a repo, created (and deleted) a codespace, interacted with the sandbox, created a minimal dataset definition, and generated a dummy dataset that is displayed in the terminal (i.e. it is not written to a file).

To become a competent user of ehrQL, however, the reader should also:

Expand the dataset definition

I'd like to check with a couple of researchers about what "expand" most usefully means,^2 but based on this dataset definition, which @alschaffer said was written by her pilots without her help,^1 I think "expand" probably means:

Write a dummy dataset to a file

The reader should add an associated action to project.yaml, which they will run with opensafely run [action]. They should compare and contrast run with exec, noticing that exec is good for eyeballing the data but run is good for developing downstream actions, especially when the dummy dataset isn't written to a CSV file.

Commit the dataset definition to main

Upon conclusion of the ehrQL tutorial, the reader will be at "Initial commit" and be ready to run the associated action on OpenSAFELY Jobs. (Crating a project and workspace, and using OpenSAFELY Jobs is out of scope.) Also, they will have created an artefact inside the codespace that persists outside the codespace.

The reader shouldn't commit the dataset definition to a feature branch and open a pull request, because different projects and different organizations have different guidelines about feature branches and pull requests.

sebbacon commented 1 year ago

Regarding "Expand the dataset definition": this reminds me of background research I've been doing in preparation for some Great Variables Library Thinking.

I've asked around a few times (example) what the most common variables are; and I've cross-referenced them with a bit of grep-foo, and I came up with this tentative list:

Fundamentally, a peer-reviews and agreed common set of things like this, in the research template, is the core of a variables library. So I'm excited to see this happening!

iaindillingham commented 1 year ago

I'm putting together an extended dataset definition in this gist, with feedback in Slack.^1

iaindillingham commented 1 year ago

Thanks, @sebbacon. At the moment, the expanded dataset definition hits several of those. I don't think it can hit them all, but hitting several suggests that it will be useful.

sebbacon commented 1 year ago

I don't think it can hit them all

Devil's advocate: why not? If nearly every study includes all of them anyway:

iaindillingham commented 1 year ago

Because it's a tutorial and not a how-to. Hitting all of them will make the tutorial longer, which means it will take more time to complete and more time to maintain. I think a more effective use of time would be to incorporate several into the tutorial and the remainder into how-tos, or, indeed, reusable variables.

sebbacon commented 1 year ago

Fair, I think I'm eliding our tutorial with our research template.

It leads me to ask if this part of the tutorial content might also live in the research template?

The familiarity when moving on from the tutorial could be helpful.

iaindillingham commented 1 year ago

It could, but I think that's a separate issue, so I've created opensafely/research-template#108.