opensafely / codelist-development

Repository for discussion of OpenSAFELY codelists
7 stars 4 forks source link

*DISEASE*: Chronic Cardiac Disease #2

Open sebbacon opened 4 years ago

sebbacon commented 4 years ago

Code for creating binary variable "presence of cardiovascular disease"

QoF Register CodeList from TPP

@alexwalkercebm

alexwalkerepi commented 4 years ago

Initial notebook, here, but we need to discuss the codelist and definition more widely. https://github.com/ebmdatalab/tpp-sql-notebook/blob/ab02ae4da2164879520db09e2f3cca513e05f48f/notebooks/cvd_covariate.ipynb

brianmackenna commented 4 years ago

I have generated a medicines codelist for all dictionary of medicines devices AMPs and VMPs for medicines in the CVD chapter of the BNF. We could use this list (or more likely a subset) as a flag for "presence of cardiovascular disease". Alex has already used this in his notebook

CarolineMorton commented 4 years ago

We also have the list of v2 Code lists from LSHTM:

chronic_cardiac_PPVrisk_July18.xlsx

How should we combine these @alexwalkercebm?

Taken from previous issue ebmdatalab/tpp-sql-notebook#29 - now closed

sebbacon commented 4 years ago

I suppose the question to resolve is how Brian's dynamic codelist differs from LSHTM's static codelist.

We'd expect the dynamic one to have more medicines as it'll be more up to date. The interesting point will be if there's anything in LHSTM's which is not in Brian's - if we made a list of those and stick it in this issue, then the differences can be reviewed with a clinical eye.

The inverse difference would also be interesting to post (although less so as we think we know what to expect).

I'm hoping the conclusion would be that the dynamic list is everything we need.

sebbacon commented 4 years ago

Oh I've misunderstood, that is the clinical codes from LSHTM, not the medicine ones?

alexwalkerepi commented 4 years ago

Yes, clinical codes.

alexwalkerepi commented 4 years ago

I think ideally we'd convert the LSHTM codes to read 3, though whatever process we establish for that. Then have someone clinical looks at both lists and determine which codes we need.

hmcd commented 4 years ago

Hello, PRIMIS have already mapped the LSHTM codes to read 3 for cardiovascular disease, and had someone clinical look at both lists to select the codes - those are the v3 in the "PPV full legacy spec" - I'll put a summary of which ones have been done and the advice they gave us on expanding the spec summary in under issue ebmdatalab/tpp-sql-notebook#44 on mapping more generally.

CarolineMorton commented 4 years ago

Has gone to @chris-tpp for mapping to v3

chris-tpp commented 4 years ago

Great. On this - will finish in the morning and return a code list and a methodology.

CarolineMorton commented 4 years ago

Thank you

CarolineMorton commented 4 years ago

Draft sign off

DEFINITION: Patients who have any cardiovascular disease Read 3 code ever on their medical records held by TPP. Absence of a code on the record is taken as no presence of cardiovascular disease.

Example: patient_id cvd_bin condition date
123 1 H/O: angina pectoris 1/2/2009
332 1 ECG: Anteroseptal infarction 2/4/2016

POTENTIAL BIASES:

CLINICAL SIGN OFF & DATE:

EPIDEMIOLOGY SIGN OFF & DATE:

SHARED WITH WIDER TEAM: Yes/No

FINAL SIGN OFF DATE (and apply label)

CarolineMorton commented 4 years ago

Read 3 coded mapped from Read Codes 2 from TPP (mapped by TPP)

Chronic_Cardiac_CTV3_Raw_Code_List.xlsx

CarolineMorton commented 4 years ago

See ebmdatalab/tpp-sql-notebook#59 More general questions about process:

@chris-tpp Would you be able to provide: 1) List of QOF clusters used in point 5 in ebmdatalab/tpp-sql-notebook#59 2) List of key terms used in Point 6 for Snowmed in same issue.

We plan to add these to the git issue (and ultimately the commit messages for the repositories for code lists) for audit. Happy to chat over the phone if easier.

@amirmehrkar and I have now been through the list clinically but we are not clear if we have all possible codes.

CarolineMorton commented 4 years ago

After discussion with @chris-tpp and @alexwalkercebm, we will need to re-run this by providing clinical input for Points 2 & 3 mentioned in https://github.com/ebmdatalab/tpp-sql-notebook/issues/59#issuecomment-608384996

This should be clearly documented in the definition.

CarolineMorton commented 4 years ago

Draft2 sign off

DEFINITION: Patients who have any cardiovascular disease Read 3 code ever on their medical records held by TPP. Absence of a code on the record is taken as no presence of cardiovascular disease.

Example output: patient_id cvd_bin condition date
123 1 H/O: angina pectoris 1/2/2009
332 1 ECG: Anteroseptal infarction 2/4/2016

CODE LISTS: Read 3 code list (when available). Created using this method by TPP:

  1. Read 2 LSHTM validated code list (https://github.com/ebmdatalab/tpp-sql-notebook/files/4414349/chronic_cardiac_PPVrisk_July18.xlsx)

  2. Adding in key clusters from QOF and mapping to CTV3 (read code 3) - added by Caroline Morton (@CarolineMorton) qof-cvd.xlsx Inclusion and exclusion criteria.

    • Included all cardiovascular, coronary heart disease and heart failure codes, excluding stroke / tia
    • Excluded all Atrial Fibrillation Codes (as likely to be different disease mechanism, and whilst some AF caused by ischaemia often occurs out of that context). Also AF not included in v2 codes from LSHTM
    • Excluded all codes for QRISK (any version) as primary prevention only
    • Excluded all codes related to referral to a diagnostic clinic for cvd such as rapid access chest pain clinic as diagnosis unclear. This differs from Read V2 code list which includes this.
  3. Adding in high level snowmed codes and mapping to CTV3. Key Terms searched for in CT SNOWMED BROWSER: snowmed-cvd.xlsx Added by Caroline Morton (@CarolineMorton)

    • coronary heart disease
    • heart failure
    • cardiomyopathy
    • coronary artery bypass
    • angina

    NOTE: Could have gone further to parent to Heart Disease but this would include valvular disease so excluded.

  4. Final list sense checked by clinician

POTENTIAL BIASES:

CLINICAL SIGN OFF & DATE:

EPIDEMIOLOGY SIGN OFF & DATE:

SHARED WITH WIDER TEAM: Yes/No

FINAL SIGN OFF DATE (and apply label)

CarolineMorton commented 4 years ago

I have had a think about Atrial Fibrillation and whether or not we should include. Can we have a team discussion about this? @hmcd @krishnanbhaskaran @alexwalkercebm at some point. At the moment, the codes do not include AF as per the read v2 code lists provided from lshtm by helen. My understanding is that they were not included there as there was a separate co-variate for AF in the original study for which the codelist was developed for. We don't currently have a AF code list? I am a bit concerned and wondered what we think about including or not including?

Thanks

Caroline

hmcd commented 4 years ago

Hi @CarolineMorton @krishnanbhaskaran @alexwalkercebm

Original list follows the green book definition of heart disease as a risk factor for flu:

"Congenital heart disease, hypertension with cardiac complications, chronic heart failure, individuals requiring regular medication and/or follow-up for ischaemic heart disease."

So it doesn't include atrial fibrillation, and only selected valve disease (have the precise definition we used on a file but can't get it from my phone).

I would be reluctant to expand the definition in case we attenuate the association of this risk group (known to be at risk of flu), esp as af is so common. If we think AF is a plausible risk factor for covid would suggest we should look at this as a separate exposure.

On Sat, 4 Apr 2020 11:43 CarolineMorton, notifications@github.com wrote:

I have had a think about Atrial Fibrillation and whether or not we should include. Can we have a team discussion about this? @hmcd https://github.com/hmcd @krishnanbhaskaran https://github.com/krishnanbhaskaran @alexwalkercebm https://github.com/alexwalkercebm at some point. At the moment, the codes do not include AF as per the read v2 code lists provided from lshtm by helen. My understanding is that they were not included there as there was a separate co-variate for AF in the original study for which the codelist was developed for. We don't currently have a AF code list? I am a bit concerned and wondered what we think about including or not including?

Thanks

Caroline

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ebmdatalab/tpp-sql-notebook/issues/7#issuecomment-609009995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI5K5A7HSQA36Y2HDJJCGBTRK4FOLANCNFSM4LXKGADA .

CarolineMorton commented 4 years ago

Ok yes that makes a lot of sense and I agree with the thinking above. For the same reason, I don't think Valvular disease should be included unless they would be under reg review or had heart failure for example.

Thank you for clarifying. We could think about AF as a another risk factor in the future but perhaps at a much later analysis date.

hmcd commented 4 years ago

Great, glad that seems sensible. Agree on valvular disease (and that was the principle used in the original list) - and also we used the same principle for congenital heart disease that only conditions that implied long term follow up were included (e.g. Fallots tetralogy included, atrial septal defect not).

On Sat, 4 Apr 2020 22:07 CarolineMorton, notifications@github.com wrote:

Ok yes that makes a lot of sense and I agree with the thinking above. For the same reason, I don't think Valvular disease should be included unless they would be under reg review or had heart failure for example.

Thank you for clarifying. We could think about AF as a another risk factor in the future but perhaps at a much later analysis date.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ebmdatalab/tpp-sql-notebook/issues/7#issuecomment-609089257, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI5K5A5G3UAOBTJTFGTHXSLRK6OSHANCNFSM4LXKGADA .

krishnanbhaskaran commented 4 years ago

Sounds like decided to exclude, but if needed later I think there may be code lists for AF and valvular heart disease from our cancer survivorship work.

CarolineMorton commented 4 years ago

FINAL SIGN OFF

DEFINITION: Patients who have any cardiovascular disease Read 3 code ever on their medical records held by TPP. Absence of a code on the record is taken as no presence of cardiovascular disease.

Example output: patient_id cvd_bin condition date
123 1 H/O: angina pectoris 1/2/2009
332 1 ECG: Anteroseptal infarction 2/4/2016

CODE LISTS: Read 3 code list - FINAL code list: CVD_CTV3_final.xlsx

Created using this method by TPP:

  1. Read 2 LSHTM validated code list (https://github.com/ebmdatalab/tpp-sql-notebook/files/4414349/chronic_cardiac_PPVrisk_July18.xlsx)

  2. Adding in key clusters from QOF and mapping to CTV3 - added by Caroline Morton (@CarolineMorton) qof-cvd.xlsx Inclusion and exclusion criteria.

    • Included all cardiovascular, coronary heart disease and heart failure codes, excluding stroke / tia
    • Excluded all Atrial Fibrillation Codes (as likely to be different disease mechanism, and whilst some AF caused by ischaemia often occurs out of that context). Also AF not included in v2 codes from LSHTM
    • Excluded all codes for QRISK (any version) as primary prevention only
    • Excluded all codes related to referral to a diagnostic clinic for cvd such as rapid access chest pain clinic as diagnosis unclear. This differs from Read V2 code list which includes this.
  3. Adding in high level snowmed codes and mapping to CTV3. Key Terms searched for in CT SNOWMED BROWSER: snowmed-cvd.xlsx Added by Caroline Morton (@CarolineMorton)

    • coronary heart disease
    • heart failure
    • cardiomyopathy
    • coronary artery bypass
    • angina

    NOTE: Could have gone further to parent to Heart Disease but this would include valvular disease so excluded.

  4. Final list sense checked by clinician (Checked by Caroline Morton @CarolineMorton). See document. CVD_CTV3_REVIEWED.xlsx I have categorised every code (column D)

    • After discussion 8.4.20 in meeting with Liam and others, we agreed to include transplant codes within the code list
    • We excluded AF and AV heart block as may be not require treatment or represent chronic heart disease
    • Congenital heart problems have been kept in the list unless it was a simple ASD (which affects a large proportion of population and may not be associated with long term problems). This has meant that VSD and AVSD have been kept in as it is less clear cut in these condition; and are often included in codes where there is a clear major congenital heart problem.
    • Some anatomical heart features such as fibrous bands have been excluded as they are likely not to require long term follow up.
    • Transient heart murmurs or valve conditions in the newborn have been excluded.

POTENTIAL BIASES: We have added in congenital heart conditions. Some of these will be surgically corrected soon after birth and may not have long term problems. Likely to be small in numbers.

CLINICAL SIGN OFF & DATE: Caroline Morton (@CarolineMorton) 7/4/2020 11:27

EPIDEMIOLOGY SIGN OFF & DATE: Krishnan Bhaskaran 8/4/2020

SHARED WITH WIDER TEAM: Yes

FINAL SIGN OFF DATE (and apply label) 8/4/2020 20:08

SJWEvans commented 4 years ago

I must be misunderstanding. Are you interested in CHD or CVD? If CVD that would usually include stroke & AF surely?

krishnanbhaskaran commented 4 years ago

Similar comment to Stephen - cardiovascular disease (CVD) would usually include stroke. (Is stroke a separate code list?)

On the other hand coronary heart disease (CHD) I think would usually EXclude heart failure and cardiomyopathy wouldn't it?

So this definition seems to fall between the two? However I note that the list of included definitions comes from the "heart disease as a risk factor flu" which @hmcd mentioned above, so maybe that is the rationale.

I would say though that it might be of interest to separate coronary heart disease from the heart failure/cardiomyopathy, which could plausibly have different impacts on risk. I'm just wondering if t these should be separate variables?

Finally - is this implemented as a binary flag or as a "date of first"? I'm not totally clear if all of these comorbidity history variables are to be implemented as a date, or only select ones such as cancer?

Sorry for all the Qs!

CarolineMorton commented 4 years ago

Hi

thank you for your comments @krishnanbhaskaran @SJWEvans. We are aiming for Chronic Cardiac Disease as the covariate after discussion initially about this with @hmcd on the group call. Are we now saying that we wish to change this?

AF patients do not receive a flu jab which was the original reason they are not included. Stroke / TIA is within the chronic neurological condition code list again as per original discussion.

It would be great to get consensus on this point before actioning any more code lists or asking TPP for further code list pulls. Perhaps we should have a chat about it first thing tomorrow?

hmcd commented 4 years ago

Hello,

I think we'd agreed as a general principle that we were following the Covid-19 social distancing risk groups. This is the codelist for the group "chronic heart disease, such as heart failure".

The Covid risk groups are based on the flu clinical risk groups, (which seems sensible to me - at least we know they are at increased risk of viral respiratory infection). The flu clinical risk group definition offers more detail, which we've used to operationalise this; "Congenital heart disease, hypertension with cardiac complications, chronic heart failure, individuals requiring regular medication and/or follow-up for ischaemic heart disease."

This seems to me a reasonably coherent group likely to be at increased risk of respiratory disease (or more severe disease) by virtue of their heart condition - I'd be keen to keep it together. (Also I'm not sure how we'd sort IHD from other causes of heart failure?)

A call to discuss sounds sensible to me- my suggestion would be to rename this variable "chronic heart disease, such as heart failure" to make it clearer what it covers.

krishnanbhaskaran commented 4 years ago

hi I think the suggested rename would help, as CVD and CHD are very widely used and people (like Stephen and I!) think they know what they should include so confusing to stray from that. Helen's suggestions seems good. I think you have a good clinical rationale which I am definitely not qualified to overrule! Was just raising as the standard CVD/CHD defs are what I'm more used to seeing in epi.

Liam is very much the oracle on anything CVD-related - worth getting him to check if poss? (I don't know if he has a github account?)

SJWEvans commented 4 years ago

I'm happy with the rationale (though as a statistician I don't know all the clinical aspects), but don't call it CHD or CVD.; perhaps something like HD4P -heart disease for pulmonary complications? (HDLP- heart disease for lung problems is less good because of HDL!)

CarolineMorton commented 4 years ago

I have changed to chronic cardiac disease - is that ok?

SJWEvans commented 4 years ago

OK, but it isn't really descriptive since AF is chronic cardiac disease, but I guess it doesn't matter if we make it clear what we mean

krishnanbhaskaran commented 4 years ago

Yes I think that name is ok

CarolineMorton commented 4 years ago

Are we now happy with the definition and the explanation why? Please comment if still not happy. It's fine if not, but we just need to decide now so we can rejig and reassign code lists. If everyone happy, maybe by thumbs upping this post, then can someone counter sign the definition above. If people want to redo the definition, please add in a post below.

SJWEvans commented 4 years ago

ok

krishnanbhaskaran commented 4 years ago

Just the query about binary vs dates that wasn't resolved? Will we get out the "date of first" here or just a binary indicator?

Finally - is this implemented as a binary flag or as a "date of first"? I'm not totally clear if all of these comorbidity history variables are to be implemented as a date, or only select ones such as cancer?

Date more flexible in the long run as we can derive both binary yes/no and also duration of disease for further analysis. On the other hand it may be that for this particular variable we don't need that. Thoughts?

SJWEvans commented 4 years ago

Date definitely better if possible

CarolineMorton commented 4 years ago

Hi @krishnanbhaskaran

We will get both out. See example output table in this comment (https://github.com/ebmdatalab/tpp-sql-notebook/issues/7#issuecomment-610307777)

You end up with both a date and a binary output. It will be an 'ever' type of scenario.

krishnanbhaskaran commented 4 years ago

oh right got you, that was staring me in the face!

sebbacon commented 4 years ago

Minor detail but when we output date there's no need for a binary variable as the absence/presence of date can be substituted. Saves space! OK?

alexwalkerepi commented 4 years ago

@sebbacon this is true, but for a lot of the analysis Stata will need a binary variable, so it just depends where it's most efficient to generate that.

CarolineMorton commented 4 years ago

Sure that sounds fine.

Can I ask are we satisfied with the definition and code list now? Can we get this signed off?

krishnanbhaskaran commented 4 years ago

I think so but given the discussions on this one suggest I just quickly summarise on Slack (for those not looking at github) in case of any final objections. Then would be happy to do the epi sign off.

SJWEvans commented 4 years ago

I haven't been through the complete code list in detail but I think it's OK from my overview

CarolineMorton commented 4 years ago

@krishnanbhaskaran, can you now sign off by editing the definition above (https://github.com/ebmdatalab/tpp-sql-notebook/issues/7#issuecomment-610307777) if all happy

thanks

krishnanbhaskaran commented 4 years ago

Yep done!

CarolineMorton commented 4 years ago

transplant codes added back in and final sign off done. Thanks everyone

evansd commented 4 years ago

Just a note to say that, as agreed in discussion with @alexwalkercebm and @CarolineMorton, the first version of this covariate will omit the "condition" text field and just record the date of first occurrence.

sebbacon commented 4 years ago

Also to query if/when we do return "condition" what should we return if they match more than one condition?

alexwalkerepi commented 4 years ago

I'd say for simplicity, just the first occurrence to start with.

krishnanbhaskaran commented 4 years ago

Agree - first date of any of the conditions.

SJWEvans commented 4 years ago

Agreed first date vital; if easy, a flag to say >1 condition met to help search for those cases later in case of questions.