Ethnicity can be specified into 16 or 6 groups. The code below is for 16 groups, and queries the primary care data. It does not include any data from secondary care datasets which also sometimes record ethnicity.
The 5 and 16 groups were first used in the 2001 census.
Dummy Data
return_expections is responsible for creating the dummy data. Please note that this dummy data is currently set as all ethnic groups being equal in numbers, which is not accurate in the real data. Please bear this in mind, particular for convergence of models and tabulating variables as low numbers can be an issue.
Incidence of ethnicity code is set in return expectations as 75%. Depending on your subgroups analysis, this may or may not be accurate in the real data.
Codelist
The relevant codelist is here. A discussion of how this was created is within this link with further details provided on Github.
The tab called Fulllist contains all the codes and includes a column for the 16 categories (column called Grouping_16) and for the 6 categories (Grouping_6).
Pull this codelist into your study
This codelist needs to be pulled into your study. In your codelist/ folder, change your codelist.txt to include this line:
opensafely/ethnicity/2020-04-27
If you are running this locally and before you want to push your code to Github, you will need to run:
opensafely codelists update
Description
Ethnicity can be specified into 16 or 6 groups. The code below is for 16 groups, and queries the primary care data. It does not include any data from secondary care datasets which also sometimes record ethnicity.
The 5 and 16 groups were first used in the 2001 census.
Dummy Data
return_expections
is responsible for creating the dummy data. Please note that this dummy data is currently set as all ethnic groups being equal in numbers, which is not accurate in the real data. Please bear this in mind, particular for convergence of models and tabulating variables as low numbers can be an issue.Incidence of ethnicity code is set in return expectations as 75%. Depending on your subgroups analysis, this may or may not be accurate in the real data.
Codelist
The relevant codelist is here. A discussion of how this was created is within this link with further details provided on Github.
The tab called Fulllist contains all the codes and includes a column for the 16 categories (column called
Grouping_16
) and for the 6 categories (Grouping_6
).Pull this codelist into your study
This codelist needs to be pulled into your study. In your
codelist/
folder, change yourcodelist.txt
to include this line:opensafely/ethnicity/2020-04-27
If you are running this locally and before you want to push your code to Github, you will need to run:
opensafely codelists update
See the documentation for more details.
Make the codelist variable
Add this Code Snippet to your
codelist.py
file within theanalysis/
folder.This will need to be imported into your
study_definition.py
with the following line to import all codelists defined there:from codelist import *
.or to just pull in the ethnicity codelist variable:
from codelist import ethnicity_codes_16
Study Definition Code Snippet