opensafely / qof

QOF in OpenSAFELY
MIT License
0 stars 0 forks source link

Add dataset definition for Dementia (DEM) case register (DEM_REG) #5

Closed atamborska closed 1 month ago

atamborska commented 2 months ago

Dementia Register: Patients with a dementia diagnosis up to the end of the reporting period.

The rules for the all QOF registers and indicators are available here The specific rules for each indicator are in the section 3.2.2.1 of selected document; in this case: e.g., Dementia_v49.0.docx.

Rule description or comments

Select patients who, on the achievement date, were registered for GMS. I.e., registered for GMS prior to or on the achievement date and either:

  1. did not subsequently deregister from GMS, or
  2. deregistered from GMS after the achievement date.

Reject the remaining patients.

Select patients from the specified population who have a dementia diagnosis recorded up to and including the achievement date. Reject the remaining patients.

Implementation details

Impletement the QOF Dementia indicator (V49; all patients with dementia should have annual personalised care plan review) for the NHS Financial Year 1st April 2023 to 31st March 2024.

Implementation steps

SETTING UP

  1. Create your workspace:

  2. Import the neccessary codelist into codelists.txt file in your newly created codespace:

    • Go to OpenCodelists and search for DEM_COD
    • Copy the part of the URL that specifies the codelist nhsd-primary-care-domain-refsets/dem_cod/20210127 into the codelists.txt file in the codelists folder
    • Next, run this command in the codelist terminal: opensafely codelists update
    • This will import the .csv file containing the DEM_COD codelist into the codelist folder. You can display the .csv file by clicking on it. You will notice there are two columns: code and a reference column term. You will use the code column to define the dataset later on.
  3. The analysis folder in the codespace contains the files that process the data (i.e. define the study cohort, to which we will refer from now on as dataset, and then analyse this dataset). In this task, we focus solely on defining the dataset. First, in the analysis folder, create a new .py file where you will write your dataset definition, e.g. analysis/dem_reg_dataset_<your-name>.py.

DEFINING THE DATASET

All of the following steps should be coded in the analysis/dem_reg_dataset_<your-name>.py file and allow you to define the dataset:

Let's look at these steps in detail.

  1. Import the neccessary tables from the TPP and/or EMIS databases. Tables contain patient variables, based on which we will define the dataset. EHR tables are in the long format (few columns - i.e. few variables, but multiple rows per patient). As a mininimum, most tables contain a column of a unique patient identifiers, a column with a clinical code (signifying a presecription, a clinical event, a patient characteristic, etc), and a column with a date when that clinical code was assigned. Because there can be multiple times the code is assigned (for example, think about recurrent prescriptions, recurrent hospital admissions), there will be frequently multiple rows per patient. When definiting the dataset, we will need to compress it to just one row, for example, by choosing the first date the medication was prescribed.

For this exercise, we will need to import tables containing information on practice registration, clinical events and patient identifiers.

We do with the following code:

from ehrql.tables.tpp import patients, practice_registrations, clinical_events
  1. Next, we create a vector called dem_reg containing the codelist. For this, we use the column code from our .csv file in the codelists folder. In a bit, will use this vector to select patients with our clinical events of interest.
  1. We now have to create an empty dataset. We will populate it by adding relevant information from the selected columns in the TPP/EMIS tables.
from ehrql import create_dataset
dataset = create_dataset()
  1. Next we will populate this empty dataset by adding columns generated from the TPP/EMIS tables.

To gereate a column of patients with the diagnostic code for dementia, we first need to identify rows which contain the code for dementia, and second, since there may be multiple rows per peatient, we need to compres the table, by selecting the most recent one.

This: clinical_events.where(clinical_events.snomedct_code.is_in(dem_cod)) identifies the rows with a dementia diagnosis.

This: [...].sort_by(clinical_events.date).last_for_patient() arranges the rows by date and selects the most recent row.

This: [...].date allows us to extract the date for the most recent dementia diagnosis.

And this: dataset.dem_dat = allows us to assign the column of the most recent dementia diagnosis for each patient, to our dataset.

We put this all together by:

dataset.dem_dat = clinical_events
  .where(clinical_events.snomedct_code.is_in(dem_cod))
  .sort_by(clinical_events.date)
  .last_for_patient()
  .date

Similar steps should be followed to determine practice registration reg_dat and deregsitration dereg_dat dates, using practice_registration tables.

  1. Next, write rules that allow us to filter the dataset.

This requires us to define the index date:

index_dat = "yyyy-mm-dd"

For dementia patients this would be:

rule_dem = dem_dat < index_dat & ~dem_dat.is_null()

Remeber to also create a rule for registration: rule_reg =...

  1. Finally, we use these rules to filter the dataset, using the following code.

The logic can be combined using & (AND), | (OR), ~ (NOT) operators.

dataset.define_population(rule_dem & rule_reg)

CREATE WORKFLOW THAT WILL GENERATE YOUR DATASET USING ITS DEFINTION

  1. Now that we created the definition of the dataset, we have to update our project pipeline to execute this definition and generate the dataset into the output folder. Note, to save space, all outputs should be generated as zipped .gz files.

We do this by writing the following in the project.yaml script:

execution:

opensafely exec ehrql:v1 generate-dataset analysis/dem_reg_dataset_<your-name>.py 
--output output/<dataset-name>.csv.gz

Notes on running ehrql and checking your code

The codespace used by OpenSAFELY does not allow to run the script lines one by one. All the script is always run top to bottom, all lines included. That's why after writing every line of code in the data definition file, it is good checking that it runs, so that bugs are fixed immediatly.

We do it by typing the following in the terminal: opensafely exec ehrql:v1 generate-dataset analysis/<dataset-script-name>.py

If we want to test & try by running individual lines, we can open a "sanbox" in the terminal (ideally by splitting the terminal into two).

We may need to load example data first (typing in the terminal): opensafely exec ehrql:v1 dump-example-data And then: opensafely exec ehrql:v1 sandbox example-data

This allows us to run indvidual lines by typing them in the sandbox terminal.

To exit the sandbox, we type: opensafely exec ehrql:v1 sandbox exit()

Acceptance criteria

milanwiedemann commented 2 months ago

@atamborska this looks great, you could try implementing this if you have some time this week. in case you get stuck we can look at your questions next week when I'm back

atamborska commented 1 month ago

Completed