opensafely / ics-research

This is the code and configuration for our paper, Inhaled corticosteroid use and risk COVID-19 related death among 966,461 patients with COPD or asthma
https://opensafely.org
0 stars 1 forks source link

Medication: Pneumococcal Vaccination #35

Closed brianmackenna closed 4 years ago

brianmackenna commented 4 years ago

See #23 for background.

*need to investigate product differences depending on indication a bit more.

richiecroker commented 4 years ago

A question for @brianmackenna - should this list only include the 23-valent vaccine, and not the 13-valent (Prevenar 13), as this is only used in primary child immunisation schedules?

brianmackenna commented 4 years ago

This is what the * was for at 10pm last night! @hmcd any thoughts?

I'm tempted to include them all becuase

hmcd commented 4 years ago

I'd agree with you Brian. I suppose it is possible that the 13-valent was given rather than 23-valent, but even if so it'd still be a pneumococcal vaccine - and likely just to be the wrong code.

hmcd commented 4 years ago

In case it's helpful for cross-checking, here's what we are using for a study with population aged >=65 years, to define pneumococcal and flu vaccine. We only use codes for PPV23 in that (as we are specifically looking an invasive pneumococcal disease so want to be clear that they had the relevant vaccine). It includes some notes about handling conflict between the structured data, Read codes and prescription records, and records for which the date of delivery is uncertain (delivered by other healthcare provider) with a table for handling same-day conflicts in that study. Vaccination_codes_Jul19.xlsx

hmcd commented 4 years ago

There's also a 10-valent vaccine (not currently part of the immunisation programme) and there'll be codes for the old 7 -valent vaccine - would you include those in the list if including 13 valent, for consistency of accepting any pneumococcal vaccine record?

brianmackenna commented 4 years ago

Ok so having reviewed that we could identify PPV vaccination

1st Preference Rule: Patient appears in TPP vaccination table & dm+d medication issue (include all strains of pneumococcal for this study)

or 

2nd Preference Rule: Presence of clinical code indicative of vaccination (where more than one code appears in the record we use the algorithm on the sheet "comments" from excel file) 

Does that work?

1st preference can be done relatively easily for first runs of data. 2nd preference may take sometime to build?

brianmackenna commented 4 years ago

I have added to csvs of READ codes for PPV vaccination. There is a classification READ codes as per @hmcd work above in each csv. Logic to be implemented.

https://github.com/ebmdatalab/vaccinations-covid-codelist-notebook/blob/master/notebooks/Pneumococcal.ipynb

richiecroker commented 4 years ago

BNF/DM+D code list checked.

brianmackenna commented 4 years ago

dm+d list https://codelists.opensafely.org/codelist/opensafely/pneumococcal-vaccination/

brianmackenna commented 4 years ago

CTV3 build from clinical codes above to be separated for rule Copy of CTV3_PPV_Raw.xlsx

brianmackenna commented 4 years ago

Added the CTV3 codes to the notebook (pic below for ease)

@hmcd looking at this again- should we only included the people where the code looks like it has been "given" and where more than one code appears we assume they haven't had? This avoids unnecessarily complex rules

image

hmcd commented 4 years ago

Hi @brianmackenna

I'd be interested to see the frequencies of the codes in TPP. If the majority of code use is the neutral type of code e.g. 6572 Pneumococcal vaccination then if we only used the ones where it's clearly 'given' then we'd miss a lot of vaccinations.

But I think we can't use the 'neutral' codes without checking for 'not given' codes. In CPRD we find that there are a reasonable number of patients with multiple codes relating to a vaccination on one day. It might be different in TPP, the data entry is clearly quite different. (But we also might want to future proof the definition so it'd also work for EMIS?). In CPRD, the neutral codes can be combined on the same day with e.g. a 'did not attend' code, and we'd want to take that as not given, rather than only counting the neutral code as assumed given.

We might also want to build in flexibility for studies: (1) to deal with conflicts. If a 'given' code is recorded on the same day as a 'not given' code, the vaccination status is unknown. This might be more or less of an issue for different studies, but for a study in which vaccination was a key exposure of interest, patients with a same-day conflict might need to be excluded from the study. (2) Historical records (given by other healthcare provider /history of vaccination) are useful if a study just wants whether a person has ever been vaccinated. And as PPV's only given once a lifetime for more adults, we wouldn't want to ignore them. But if the study was interested in timing of PPV (e.g. VE by time since vaccination, or a SCCS) then we'd want to drop these patients as the timing of PPV would be unclear.

One option might be to extract 4 codelists, with the variable being the latest date on which one the codes was recorded: 'given' 'neutral' 'not given' 'historical' Studies would then have flexibility to identify if the latest record had more than one type of code, identify if those conflicted, and whether the vaccination could be assigned to that date.

Otherwise, we might want reasonably complex rules - and then they might not fit every study.

Could that be a sensible approach? (What I'd really like is to extract all vaccination records, but appreciate we can't do that!)

PS. In CPRD, the structured data can also include a 'vaccination status' of given/not given which can conflict with the Read code, but it looks like this isn't possible in TPP, @chris-tpp ?

hmcd commented 4 years ago

@brianmackenna

Tried to come up with something simple - bit tricky as we usually have to do a lot of data cleaning for vaccination status, and the rules depend on the study. Think the pragmatic option 1 is a sensible approach for speed for the ICS study.

For making the most of the data by also using the Read codes, would the following approach work for PPV?

Have attached a csv classifying the CTV3 codes relating to pneumococcal vaccine into 5 categories: Neutral, given, not given (these three to be used to assign vaccination status); and date unclear, child product (these codes to be flagged in case needed for study-specific data cleaning).* CTV3_pneumococcal_vaccine_codes.xlsx

Possible approach

  1. Search using full codelist to find latest date of any pneumococcal vaccination record.
  2. If the codes on that date do not include any listed as ‘not given’, then record vaccination status as ‘given’ with date as vaccination status date, and output all Read codes on that date under the relevant output variables.
  3. If the codes on that date include both ‘not given’ and ‘given’, then record status as ‘conflict’ with date, and record all Read codes on that date.
  4. If the codes on that date include ‘not given’ but no ‘given’, then look for latest previous vaccination record and start again. If no previous date of pneumococcal vaccination record, record vaccination status as ‘none recorded’.

Output variables (1) Vaccination status (given/conflict/ none recorded) (2) Vaccination status date (date on which vaccination status is based, blank for none recorded) (3) Vaccination codes (all CTV3 codes from neutral or given list recorded on vaccination status date) (4) Date unclear (all CTV3 codes from ‘date unclear’ list recorded on vaccination status date) (5) Child (all CTV3 codes from ‘child’ list recorded on vaccination status date)

brianmackenna commented 4 years ago

Great - I have added the categories to the notebook so they can be easily imported into https://codelists.opensafely.org/

The only one I had a question over is Pneumococcal vaccination given by other healthcare provider should this not be catgorised as given. The date may be unclear but it is very different to others in that category.

hmcd commented 4 years ago

Hmm, good point, my attempt at simple was too simple!

The approach above uses the 'given' flags to highlight conflicting records on the same day, in which vaccination status is unclear as it is both 'given' and 'not given'.

But if on the same day someone has 'pneumococcal vaccine declined' and 'pneumococcal vaccine given by other healthcare provider' it's not a same-day conflict with unclear vaccination status - it seems reasonably clear that someone has not been given the vaccination on that date, but was given it previously. I would usually take that as assumed given, but with uncertain date, rather than a conflict or looking for a previous record... so could we add one more step?

  1. Search using full codelist to find latest date of any pneumococcal vaccination record.
  2. If the codes on that date do not include any listed as ‘not given’, then record vaccination status as ‘given’ with date as vaccination status date, and output all Read codes on that date under the relevant output variables.
  3. else if the codes on that date include 'given' (ie codes on that date include both ‘not given’ and ‘given’), then record status as ‘conflict’ with date, and record all Read codes on that date. 4. else if the codes on that date include 'date_unclear' (ie codes on that date include both 'not given' and 'date_unclear' but not 'given') then record status as 'given' and record all Read codes on that date. The content in the 'date unclear' variable will then indicate that the date can't be trusted, but the person has a history of vaccination.
  4. else (I think this should now cover if the codes on that date include a ‘not given’ but neither ‘given’ nor 'date unclear') look for latest previous vaccination record and start again. If no previous date of pneumococcal vaccination record, record vaccination status as ‘none recorded’.

Output variables (1) Vaccination status (given/conflict/ none recorded) (2) Vaccination status date (date on which vaccination status is based, blank for none recorded) (3) Vaccination codes (all CTV3 codes from neutral or given list recorded on vaccination status date) (4) Date unclear (all CTV3 codes from ‘date unclear’ list recorded on vaccination status date) (5) Child (all CTV3 codes from ‘child’ list recorded on vaccination status date)

Note on misclassification: Combining declined/did not attend/ and contraindicated/not indicated together as ‘not given’ will result in some misclassification as it is possible a vaccine may have been given despite being not indicated or contraindicated, but it should be minimal and does make the approach a lot simpler.

I'm ignoring combinations of 'not given' with 'child' flags here, treating the 'child' codes as neutral as to whether an adult vaccine has been given (ie if it's not recorded on the same day as a 'not given' code then we'll take it as a vaccine record, but if it's on the same day as a 'not given' record then we'll ignore it as it doesn't necessarily conflict with that nor definitely imply a history of vaccination as an adult). Does that seem OK to you @Brian?

alexwalkerepi commented 4 years ago

For the ICS study, I propose a slightly simpler version for implementing this. The reasons for this are:

  1. look for all codes where the vaccine was clearly given.
  2. Ignore any events where a not_given code was also recorded on the same day.
    • this means that previous days where there is just a given code will still be included.
  3. Don't use codes that relate to historic vaccination, as for this study we want vaccinations that are in a specific time period.
  4. Don't use codes that are unclear about whether the vaccination was given, as these might relate to e.g. invitations for vaccination, rather than administration.
    pneumococcal_vaccine_clinical=patients.with_these_clinical_events(
        pneumococcal_clinical_given_codes,
        ignore_days_where_these_codes_occur=pneumococcal_clinical_not_given_codes,
        between=["2015-03-01", "2020-02-29"],  # past five years
        return_first_date_in_period=True,
        include_month=True,
        return_expectations={
            "date": {"earliest": "2015-03-01", "latest": "2020-02-29"}
        },
    ),
CarolineMorton commented 4 years ago

This sounds good. Thanks for putting together. What was the conclusion about extending out to ~7 years to capture everyone in the first year it was rolled out?

alexwalkerepi commented 4 years ago

I think we decided to leave it at 5 years, mostly because the vaccine effectiveness wanes over time.

hmcd commented 4 years ago

The PPV algorithm for the ICS study looks good to me - looks like the vaccination table captures it well.

For flu (#34) think we could consider including the 'recent_past' codes together with the 'given' codes since: (1) reasonable to assume a flu vaccine being recorded relates to that flu season unless it's obviously historical (2) vaccination table might capture the vaccinations less well than PPV since it's more commonly given in other settings than PPV (and national surveillance also doesn't capture this, so less reassured by having similar uptake as surveillance and (3) the ICS study doesn't need an exact flu vaccination date, just whether they have been vaccinated prior to the index date, which a 'recent_past' code before the index would indicate. But hopefully wouldn't add to the coding, as the 'recent_past' codes could just be included in the same list?