opensafely / variable-library

Repo for code snippets for the components of the variable library
0 stars 0 forks source link

Ethnicity in 16 Categories #2

Open CarolineMorton opened 3 years ago

CarolineMorton commented 3 years ago

Description

Ethnicity can be specified into 16 or 6 groups. The code below is for 16 groups, and queries the primary care data. It does not include any data from secondary care datasets which also sometimes record ethnicity.

The 5 and 16 groups were first used in the 2001 census.

Dummy Data

return_expections is responsible for creating the dummy data. Please note that this dummy data is currently set as all ethnic groups being equal in numbers, which is not accurate in the real data. Please bear this in mind, particular for convergence of models and tabulating variables as low numbers can be an issue.

Incidence of ethnicity code is set in return expectations as 75%. Depending on your subgroups analysis, this may or may not be accurate in the real data.

Codelist

The relevant codelist is here. A discussion of how this was created is within this link with further details provided on Github.

The tab called Fulllist contains all the codes and includes a column for the 16 categories (column called Grouping_16) and for the 6 categories (Grouping_6).

Pull this codelist into your study

This codelist needs to be pulled into your study. In your codelist/ folder, change your codelist.txt to include this line:

opensafely/ethnicity/2020-04-27

If you are running this locally and before you want to push your code to Github, you will need to run: opensafely codelists update

See the documentation for more details.

Make the codelist variable

Add this Code Snippet to your codelist.py file within the analysis/ folder.

ethnicity_codes_16 = codelist_from_csv(
    "codelists/opensafely-ethnicity.csv",
    system="ctv3",
    column="Code",
    category_column="Grouping_16",
)

This will need to be imported into your study_definition.py with the following line to import all codelists defined there: from codelist import *.

or to just pull in the ethnicity codelist variable: from codelist import ethnicity_codes_16

Study Definition Code Snippet

    # ETHNICITY IN 16 CATEGORIES
    ethnicity_16=patients.with_these_clinical_events(
        ethnicity_codes_16,
        returning="category",
        find_last_match_in_period=True,
        include_date_of_match=False,
        return_expectations={
            "category": {
                "ratios": {
                    "1": 0.0625,
                    "2": 0.0625,
                    "3": 0.0625,
                    "4": 0.0625,
                    "5": 0.0625,
                    "6": 0.0625,
                    "7": 0.0625,
                    "8": 0.0625,
                    "9": 0.0625,
                    "10": 0.0625,
                    "11": 0.0625,
                    "12": 0.0625,
                    "13": 0.0625,
                    "14": 0.0625,
                    "15": 0.0625,
                    "16": 0.0625,
                }
            },
            "incidence": 0.75,
        },
    ),
sebbacon commented 3 years ago

Ethnicity can be specified into 16 or 6 groups.

Why? Could be worth referencing 2001 census etc?

CarolineMorton commented 3 years ago

Thanks added @sebbacon