Dementia Register: Patients with a dementia diagnosis up to the end of the reporting period.
The rules for the all QOF registers and indicators are available here The specific rules for each indicator are in the section 3.2.2.1 of selected document; in this case: e.g., Dementia_v49.0.docx.
Rule description or comments
Rule 1: GMS registration status
Select patients who, on the achievement date, were registered for GMS. I.e., registered for GMS prior to or on the achievement date and either:
did not subsequently deregister from GMS, or
deregistered from GMS after the achievement date.
Reject the remaining patients.
Rule 2: Dementia diagnosis
Select patients from the specified population who have a dementia diagnosis recorded up to and including the achievement date. Reject the remaining patients.
Implementation details
Impletement the QOF Dementia indicator (V49; all patients with dementia should have annual personalised care plan review) for the NHS Financial Year 1st April 2023 to 31st March 2024.
Copy the part of the URL that specifies the codelist nhsd-primary-care-domain-refsets/dem_cod/20210127 into the codelists.txt file in the codelists folder
Next, run this command in the codelist terminal: opensafely codelists update
This will import the .csv file containing the DEM_COD codelist into the codelist folder. You can display the .csv file by clicking on it. You will notice there are two columns: code and a reference column term. You will use the code column to define the dataset later on.
The analysis folder in the codespace contains the files that process the data (i.e. define the study cohort, to which we will refer from now on as dataset, and then analyse this dataset). In this task, we focus solely on defining the dataset. First, in the analysis folder, create a new .py file where you will write your dataset definition, e.g. analysis/dem_reg_dataset_<your-name>.py.
DEFINING THE DATASET
All of the following steps should be coded in the analysis/dem_reg_dataset_<your-name>.py file and allow you to define the dataset:
Import TPP/EMIS tables
Create objects (vectors of strings) with the clinical codes
Create an empty dataset
Add new columns to this emty dataset. The new columns are generated from the columns available in the tables
Write filtering rules which will be applied to the new columns in the dataset
Filter the dataset using the rules.
Let's look at these steps in detail.
Import the neccessary tables from the TPP and/or EMIS databases. Tables contain patient variables, based on which we will define the dataset. EHR tables are in the long format (few columns - i.e. few variables, but multiple rows per patient). As a mininimum, most tables contain a column of a unique patient identifiers, a column with a clinical code (signifying a presecription, a clinical event, a patient characteristic, etc), and a column with a date when that clinical code was assigned. Because there can be multiple times the code is assigned (for example, think about recurrent prescriptions, recurrent hospital admissions), there will be frequently multiple rows per patient. When definiting the dataset, we will need to compress it to just one row, for example, by choosing the first date the medication was prescribed.
For this exercise, we will need to import tables containing information on practice registration, clinical events and patient identifiers.
We do with the following code:
from ehrql.tables.tpp import patients, practice_registrations, clinical_events
Next, we create a vector called dem_reg containing the codelist. For this, we use the column code from our .csv file in the codelists folder. In a bit, will use this vector to select patients with our clinical events of interest.
First, we have to import 'codelist_from_csv' function into our environment, using following code:
We now have to create an empty dataset. We will populate it by adding relevant information from the selected columns in the TPP/EMIS tables.
from ehrql import create_dataset
dataset = create_dataset()
Next we will populate this empty dataset by adding columns generated from the TPP/EMIS tables.
To gereate a column of patients with the diagnostic code for dementia, we first need to identify rows which contain the code for dementia, and second, since there may be multiple rows per peatient, we need to compres the table, by selecting the most recent one.
This:
clinical_events.where(clinical_events.snomedct_code.is_in(dem_cod))
identifies the rows with a dementia diagnosis.
This:
[...].sort_by(clinical_events.date).last_for_patient()
arranges the rows by date and selects the most recent row.
This:
[...].date
allows us to extract the date for the most recent dementia diagnosis.
And this:
dataset.dem_dat =
allows us to assign the column of the most recent dementia diagnosis for each patient, to our dataset.
Remeber to also create a rule for registration: rule_reg =...
Finally, we use these rules to filter the dataset, using the following code.
The logic can be combined using & (AND), | (OR), ~ (NOT) operators.
dataset.define_population(rule_dem & rule_reg)
CREATE WORKFLOW THAT WILL GENERATE YOUR DATASET USING ITS DEFINTION
Now that we created the definition of the dataset, we have to update our project pipeline to execute this definition and generate the dataset into the output folder. Note, to save space, all outputs should be generated as zipped .gz files.
We do this by writing the following in the project.yaml script:
The codespace used by OpenSAFELY does not allow to run the script lines one by one.
All the script is always run top to bottom, all lines included.
That's why after writing every line of code in the data definition file, it is good checking that it runs, so that bugs are fixed immediatly.
We do it by typing the following in the terminal:
opensafely exec ehrql:v1 generate-dataset analysis/<dataset-script-name>.py
If we want to test & try by running individual lines, we can open a "sanbox" in the terminal (ideally by splitting the terminal into two).
We may need to load example data first (typing in the terminal):
opensafely exec ehrql:v1 dump-example-data
And then:
opensafely exec ehrql:v1 sandbox example-data
This allows us to run indvidual lines by typing them in the sandbox terminal.
To exit the sandbox, we type:
opensafely exec ehrql:v1 sandbox exit()
Acceptance criteria
Import the required codelists
[ ] DEM_COD
Identify and/or create the variables needed for the dataset definitions
[ ] index_dat
[ ] dem_dat
[ ] reg_dat
[ ] dereg_dat
Create the dataset rules using the above definitions
[ ] Rule 1: rule_dem
[ ] Rule 2: rule_reg
Add an action the to the dataset definitions in the project.yaml file
@atamborska this looks great, you could try implementing this if you have some time this week. in case you get stuck we can look at your questions next week when I'm back
Dementia Register: Patients with a dementia diagnosis up to the end of the reporting period.
The rules for the all QOF registers and indicators are available here The specific rules for each indicator are in the section 3.2.2.1 of selected document; in this case: e.g., Dementia_v49.0.docx.
Rule description or comments
Select patients who, on the achievement date, were registered for GMS. I.e., registered for GMS prior to or on the achievement date and either:
Reject the remaining patients.
Select patients from the specified population who have a dementia diagnosis recorded up to and including the achievement date. Reject the remaining patients.
Implementation details
Impletement the QOF Dementia indicator (V49; all patients with dementia should have annual personalised care plan review) for the NHS Financial Year 1st April 2023 to 31st March 2024.
Implementation steps
SETTING UP
Create your workspace:
<github-username>/add-dem-reg
Import the neccessary codelist into
codelists.txt
file in your newly created codespace:DEM_COD
nhsd-primary-care-domain-refsets/dem_cod/20210127
into thecodelists.txt
file in the codelists folderopensafely codelists update
.csv
file containing theDEM_COD
codelist into the codelist folder. You can display the.csv
file by clicking on it. You will notice there are two columns:code
and a reference columnterm
. You will use thecode
column to define the dataset later on.The analysis folder in the codespace contains the files that process the data (i.e. define the study cohort, to which we will refer from now on as
dataset
, and then analyse this dataset). In this task, we focus solely on defining the dataset. First, in the analysis folder, create a new.py
file where you will write your dataset definition, e.g.analysis/dem_reg_dataset_<your-name>.py
.DEFINING THE DATASET
All of the following steps should be coded in the
analysis/dem_reg_dataset_<your-name>.py
file and allow you to define the dataset:Let's look at these steps in detail.
For this exercise, we will need to import tables containing information on practice registration, clinical events and patient identifiers.
We do with the following code:
dem_reg
containing the codelist. For this, we use the columncode
from our.csv
file in the codelists folder. In a bit, will use this vector to select patients with our clinical events of interest.To gereate a column of patients with the diagnostic code for dementia, we first need to identify rows which contain the code for dementia, and second, since there may be multiple rows per peatient, we need to compres the table, by selecting the most recent one.
This:
clinical_events.where(clinical_events.snomedct_code.is_in(dem_cod))
identifies the rows with a dementia diagnosis.This:
[...].sort_by(clinical_events.date).last_for_patient()
arranges the rows by date and selects the most recent row.This:
[...].date
allows us to extract the date for the most recent dementia diagnosis.And this:
dataset.dem_dat =
allows us to assign the column of the most recent dementia diagnosis for each patient, to our dataset.We put this all together by:
Similar steps should be followed to determine practice registration
reg_dat
and deregsitrationdereg_dat
dates, using practice_registration tables.This requires us to define the index date:
For dementia patients this would be:
Remeber to also create a rule for registration:
rule_reg =...
The logic can be combined using
&
(AND),|
(OR),~
(NOT) operators.CREATE WORKFLOW THAT WILL GENERATE YOUR DATASET USING ITS DEFINTION
.gz
files.We do this by writing the following in the
project.yaml
script:execution:
Notes on running ehrql and checking your code
The codespace used by OpenSAFELY does not allow to run the script lines one by one. All the script is always run top to bottom, all lines included. That's why after writing every line of code in the data definition file, it is good checking that it runs, so that bugs are fixed immediatly.
We do it by typing the following in the terminal:
opensafely exec ehrql:v1 generate-dataset analysis/<dataset-script-name>.py
If we want to test & try by running individual lines, we can open a "sanbox" in the terminal (ideally by splitting the terminal into two).
We may need to load example data first (typing in the terminal):
opensafely exec ehrql:v1 dump-example-data
And then:opensafely exec ehrql:v1 sandbox example-data
This allows us to run indvidual lines by typing them in the sandbox terminal.
To exit the sandbox, we type:
opensafely exec ehrql:v1 sandbox exit()
Acceptance criteria
Import the required codelists
DEM_COD
Identify and/or create the variables needed for the dataset definitions
index_dat
dem_dat
reg_dat
dereg_dat
Create the dataset rules using the above definitions
Add an action the to the dataset definitions in the
project.yaml
fileproject.yaml