opensafely-core / cohort-extractor

Cohort extractor tool which can generate dummy data, or real data against OpenSAFELY-compliant research databases
Other
38 stars 13 forks source link

Add ECDS data to backend #182

Closed inglesp closed 3 years ago

inglesp commented 4 years ago

Split out from https://github.com/opensafely/cohort-extractor/issues/163.

We want:

  • Date of attendance
  • No of attendances in X period of time
  • COVID attendance (from codelist)
  • Discharge destination
inglesp commented 4 years ago

Proposed API:

# date of attendance
patients.admitted_to_emergency_care(
    on_or_after="2020-02-01",
    returning="date_admitted"  # or "binary_flag"
)

# no of attendances in in period of time
patients.admitted_to_emergency_care(
    between=["2020-02-01", "2021-02-01"],
    returning="number_of_matches_in_period"
)

# covid attendance
patients.admitted_to_emergency_care(
    on_or_after="2020-02-01",
    with_these_codes=[...],
    returning="date_admitted"  # or "binary_flag"
)

# discharge destination
patients.admitted_to_emergency_care(
    on_or_after="2020-02-01",
    returning="discharge_destination"
)

@CarolineMorton, @alexwalkercebm, does this look like it captures everything you might want to do with ECDS data?

@evansd, @sebbacon, does this fit in with existing patterns? Does anything look unimplementable?

CarolineMorton commented 4 years ago

Yes this looks exactly what we need. Thanks @inglesp

HelenCEBM commented 4 years ago

Some notes on this here: https://github.com/opensafely/outcomes-notebook/issues/3

Some points and questions:

inglesp commented 4 years ago

I wouldn't use the phrase admitted_to_emergency_care because that usually means being admitted to a ward/bed from A&E and so may cause confusion with inpatient records - would suggest attended_accident_and_emergency or similar is more appropriate.

This is helpful.

Just to check, can the above items be combined? E.g. Patient had a covid attendance and was admitted from that particular attendance?

Yes.

do we want to have a separate "likely covid" category based upon "shortness of breath" and similar diagnoses? Or will that be built into the "covid attendance" item in the study def as needed?

Will these categories come from a codelist (like patients.with_these_clinical_events) or will they be hardcoded (like patients.most_recent_bmi)? I would expect the former, in which case the API doesn't need to know about "likely covid" etc.

a data quality flag might be helpful, e.g. if a trust does not complete diagnosis codes then the flag for covid attendance cannot be completed.

What would the API for this look like?

I doubt we will need all the possible outputs from discharge_destination - there are a dozen or so but this could be simplified to a flag for "admitted/transferred" (vs "not admitted")

OK. Which field, and which values in that field, should we use to compute that flag? (see https://github.com/ebmdatalab/tpp-sql-notebook/blob/master/notebooks/tpp-schema.ipynb)

For aftershocks we will need to identify many possible diagnosis codes other than Covid.

That's what with_these_codes=[...], supports.

HelenCEBM commented 4 years ago

a data quality flag might be helpful, e.g. if a trust does not complete diagnosis codes then the flag for covid attendance cannot be completed.

What would the API for this look like?

Perhaps we need another category when checking against a list of diagnosis codes: yes (patient had one of the relevant diagnoses), no, or "unable to tell". However we should probably consult the ECDS data quality report to figure out what to do here. It's possible that the discharge destination is also poorly completed by some trusts.

I doubt we will need all the possible outputs from discharge_destination - there are a dozen or so but this could be simplified to a flag for "admitted/transferred" (vs "not admitted")

OK. Which field, and which values in that field, should we use to compute that flag? (see https://github.com/ebmdatalab/tpp-sql-notebook/blob/master/notebooks/tpp-schema.ipynb)

inglesp commented 4 years ago

@HenryDrysdale

HenryDrysdale commented 4 years ago

@HelenCEBM @inglesp Thanks. Helen's suggestion of "admitted" vs "not admitted" sounds sensible.

I think "not admitted" should mean the patient attended A&E, but didn't require further secondary care given their clinical status at that time. There are a few codes for discussion:

"Emergency department discharge to ambulatory emergency care service"

"Discharge to hospital at home service"

"Patient transfer, to another health care facility"

"Emergency department discharge to emergency department short stay ward"

"Not admitted" would then include:

All other codes indicate severe disease and would come under "admitted". As Helen mentioned, "Admission to mortuary" could be considered separately.

sebbacon commented 4 years ago

Just to note we should capture this documentation-type info in docstrings or elsewhere!

HelenCEBM commented 4 years ago

Thanks @HenryDrysdale

"Patient transfer, to another health care facility"

This is likely to mean the patient was really unwell and needed to go to a hospital with an ICU or a cardiac cath lab, i.e. they needed lots more secondary care because they had severe disease. It's unlikely to mean they were transfered somewhere for a routine scan and then went home. So I think this comes under "admitted"

I would agree provided we limit attendances to those in major A&E units (type 1 & 2). I'm aware of people being transferred from MIUs to major hospitals because they have a suspected thing that needs a particular scanner not available in the MIU - but they may remain in A&E in the second hospital before being sent home. However, in such cases we should (in theory!) have a record of their attendance the second hospital, which we should be able to select instead to find the patient's ultimate destination (admitted/not).

HenryDrysdale commented 4 years ago

@HelenCEBM Yes good point - I hadn't thought of MIUs.

A common example of this is probably CTPA scans to rule out PE: I think all A&E departments will have CT scanners, but MIUs probably don't. So patients who go to an MIU with ?PE would need to be transfered for a CTPA, but it's very often negative and the patient then goes home.

The trouble with limiting to type 1 and 2 is patients are often referred straight from MIUs (and similar) to the hospital medical teams for further assessment or admission, bypassing A&E. So we might miss these.

HelenCEBM commented 4 years ago

I suppose we should consider what it means if someone with covid (or suspected) is in an MIU and they get transferred. Are they likely be just having more tests and likely sent home? Or more likely in need of ICU?

HenryDrysdale commented 4 years ago

Yes I agree. I'm really not sure what's more likely, and it's not clear to me from anecdotal experience. I think ICU is unlikely from an MIU, but an assessment at hospital and short admission for monitoring / fluids is very common. (More serious admissions also common).

So in summary, we're unsure what the code "Patient transfer, to another health care facility" says about the severity of disease for COVID patients in the ECDS dataset, because:

  1. Lots of these patients may be very unwell, and transfered to hospitals for escalation of care, such as HDU or ICU.

  2. On the other hand, lots of these patients may be relatively well, and transfered from an MIU to hospital for a precautionary scan before being sent home.

On balance, I think of all uses of this code in ECDS, more are likely to represent unwell patients than well patients.

inglesp commented 4 years ago

This is all useful, thanks @HelenCEBM and @HenryDrysdale.

In the first instance, I don't think we should hardcode the codes that we're interested in, either for diagnoses or for discharge destinations. These might be different for different studies, and may change over time. We can always add them later if the same patterns keep coming up. Instead, we can have eg:

was_discharged_to_care_home_or_nursing_home=patients.attended_emergency_care(
    on_or_after="2020-02-01",
    returning="binary_flag",
    discharged_to=[
        "306706006",  # care home
        "306689006",  # nursing home
    ],
),

And for larger lists of codes for diagnoses, we can use existing codelist functionality.

HelenCEBM commented 4 years ago

Just to note, I wouldn't rely on those particular discharge codes to be accurate - largely staff will use the code for "discharged home" which may or may not be a care home. No one would newly enter a care home directly from A&E, and we have better ways of determining care home residency. So I think admitted vs not admitted is the best use of these codes.

HelenCEBM commented 4 years ago

Diagnosis codes for people attending A&E with a positive SGSS test result within a few weeks of the attendance, up to 18 April, with approximate counts. cc @HenryDrysdale

Notes

 DiagnosisCode diagnosis_code_count SNOMED_Description Possible Covid?
50417007 3000 Lower respiratory tract infection (disorder) yes
1240751000000100 1500 Coronavirus disease 19 caused by severe acute respiratory syndrome coronavirus 2 (disorder) yes
91302008 700 Sepsis (disorder) yes
233604007 650 Pneumonia (disorder) yes
278516003 350 Lobar pneumonia (disorder) yes (less likely)
306206005 300 Referral to service (procedure) no
54150009 300 Upper respiratory infection (disorder) yes
68566005 300 Urinary tract infectious disease (disorder) no
13645005 200 Chronic obstructive lung disease (disorder) yes (with exacerbation)
281900007 200 No abnormality detected (finding) no
398447004 200 Severe acute respiratory syndrome (disorder) yes
14669001 150 Acute renal failure syndrome (disorder) no
2776000 150 Delirium (disorder) no
6142004 150 Influenza (disorder) no
195967001 100 Asthma (disorder) yes (with exacerbation)
42343007 100 Congestive heart failure (disorder) no
49436004 100 Atrial fibrillation (disorder) no
394659003   Acute coronary syndrome (disorder) no
422588002   Aspiration pneumonia (disorder) no (but possible confusion with covid)
12463005   Infectious gastroenteritis (disorder) no
HelenCEBM commented 4 years ago

@HenryDrysdale @rozeggo @wjchulme @amirmehrkar cc @CarolineMorton @inglesp

We discussed how to select "Covid A&E attendances" (ie. people with severe enough disease to warrant urgent medical care), using diagnosis codes, starting from the list above.

Other useful information to consider:

amirmehrkar commented 4 years ago

The SUS ECDS dataset processed by NHS England data store does not include the diagnosis qualifier code (table 16; row 5 and on SNOMED browser refset.

It is unlikely to be accurate anyway, as there are no validated point of care tests. Unless the patient was a readmission with know previous C19 result; however, it is possible clinically the qualifier confirmed may have also been used.

HelenCEBM commented 4 years ago

Top 20 chief complaints for everyone with a positive test near the time of their attendance are now within here and pasted below.

I can't find a comprehensive dictionary of all possible codes - @HenryDrysdale could you have a look?? The first is shortness of breath.

EC_Chief_Complaint_SNOMED_CT Count SNOMED_Description
267036007 5000 NaN
386661006 1000 NaN
230145002 850 NaN
13791008 550 NaN
394616008 300 NaN
29857009 300 NaN
21522001 250 NaN
49727002 200 NaN
40917007 200 NaN
427461000 150 NaN
3006004 100 NaN
127279002 100 NaN
422400008 100 NaN
62315008 100 NaN
82271004 100 Injury of head (disorder)
271594007 50 NaN
91175000 50 NaN
25064002 50 NaN
80313002 50 NaN
80394007 50 Hyperglycemia (disorder)
HenryDrysdale commented 4 years ago

@HelenCEBM great work.

Yes I tried before and couldn't find one - you can get the codes from the data quality dashboard but this isn't up to date. Can we ask NHS D for a list of currently used codes? Who could we speak to?

HelenCEBM commented 4 years ago

@inglesp any tips?

HelenCEBM commented 4 years ago
EC_Chief_Complaint_SNOMED_CT SNOMED_Description
267036007 Dyspnea (finding)
386661006 Fever (finding)
230145002 Difficulty breathing (finding)
13791008 Asthenia (finding)
394616008 Unsteady gait (finding)
29857009 Chest pain (finding)
21522001 Abdominal pain (finding)
49727002 Cough (finding)
40917007 Clouded consciousness (finding)
427461000 Near syncope (disorder)
3006004 Disturbance of consciousness (finding)
127279002 Injury of lower extremity (disorder)
422400008 Vomiting (disorder)
62315008 Diarrhea (finding)
82271004 Injury of head (disorder)
271594007 Syncope (disorder)
91175000 Seizure (finding)
25064002 Headache (finding)
80313002 Palpitations (finding)
80394007 Hyperglycemia (disorder)
HelenCEBM commented 4 years ago

Summary of chat today with @wjchulme @HenryDrysdale and an A&E registrar:

Summary of conservative approach to identifying possible covid attendances:

Questions remaining:

wjchulme commented 4 years ago

Looks great. Just to be explicit that this conservative approach is identifying attendances where the patient's condition is severe enough for admission, but will still miss those without a positive test (due to no test or poor test sensitivity).

Tom rang back just now and is increasingly sceptical about the utility of ECDS for identifying covid related attendance / admission due to: inconsistent/unreliable coding practices, especially for diagnoses (chief complaint slightly better as done by nurses, but there's only one code per patient); limited clinical info on investigations, bloods, etc; limited discharge destination info; general lack of specificity for all the available info even if it was complete and reliable.

So it's useful to see what we get from this, but if we want an intermediate clinical state that's worse than suspected covid at home but not as severe as an ICU admission we should think about other ways too.

@HenryDrysdale are you happy for me to make a start on the codelists and then you can update as necessary?

HenryDrysdale commented 4 years ago

@wjchulme Yes sounds great, thanks.

I agree with Tom's concerns, particularly around using investigations, bloods, discharge destinations etc. But I think our sensitivity for detecting COVID is better than it might be given that patients in our cohort: were unwell and admitted without an injury; had a positive covid swab close to admission; were admitted during the COVID-19 pandemic. It'll be interesting to see what we get, and we'll be open about limitations.

wjchulme commented 4 years ago

Formalising our approach in pseudocode:

IF diagnosis == "covid" THEN covid_admission = 1
ELSE IF (
    (test_pos_date BETWEEN(attendance_date-14, attendance_date+7) OR suspected_covid BETWEEN(attendance_date-14, attendance_date)) AND 
    injury_date == NULL AND
    discharge_destination == [admitted_for_further_care_codes] AND
    (diagnosis IN [covid_related_diagnosis_codes] OR complaint IN [covid_related_complaint_codes]) AND
    NOT (diagnosis IN [covid_ruleout_diagnosis_codes] OR complaint IN [covid_ruleout_complaint_codes])
) THEN covid_admission = 1
ELSE covid_admission = 0

I assume we need the ruleout codes as for instance complaint="fever+cough" and diagnosis="aspiration pneumonia" would be picked up as COVID. Or even diagnosis1="aspiration pneumonia" and diagnosis2="upper resp. infection". If so, we need a list for those too -- I wonder if for the ruleout list we should search amongst more than just top diagnoses as done before?

Alternatively, we're more strict with our choice of covid diagnosis codes and we ignore complaint:

IF diagnosis == "covid" THEN covid_admission = 1
ELSE IF (
    (test_pos_date BETWEEN(attendance_date-14, attendance_date+7) OR suspected_covid BETWEEN(attendance_date-14, attendance_date)) AND 
    injury_date == NULL AND
    discharge_destination == [admitted_for_further_care_codes] AND
    diagnosis IN [strictly_covid_related_diagnosis_codes]
) THEN covid_admission = 1
ELSE covid_admission = 0

But we risk missing a lot of cases then.

Either way, we need codelists for all the items in square brackets.

Bengoldacre commented 4 years ago

epic and magnificent.

re implementing the logic, just to reiterate a general principle: where there are no architectural blockages, it's always best if we can implement logic like this in the study definition, rather than the backend, as that leaves a brighter legacy of shared code for others.

HelenCEBM commented 4 years ago

Note this now applies to the new EC table which should replace ECDS

HelenCEBM commented 4 years ago

@inglesp @evansd this needs switching over to use EC instead of ECDS. I'm not sure where to find the fields we're currently using but if you point me in the direction I'm happy to help identify which fields need change of name

inglesp commented 4 years ago

@HelenCEBM the SQL gets built in this function.

Specifically, we're using:

HelenCEBM commented 4 years ago

Thanks @inglesp. Using tpp schema for reference:

  1. ECDS can simply change to EC. The fields have the same names.

  2. There are a couple of options for ECDS_EC_Diagnoses

    • Use table EC_Diagnosis - but this is one-line-per-attendance with fields EC_Dianosis_01, EC_Dianosis_02 etc unlike the previous table we were using (which had one row per diagnosis, i.e. potentially many rows per attendance).
    • Use the field Der_EC_Diagnosis_All in the main table (EC) which has a comma-separated list of diagnosis codes.
rohinimathur commented 4 years ago

Hi @inglesp I am using ECDS data currently for the ethnicity study. Could you please advise on how to update study definition to point to EC tables instead?

inglesp commented 4 years ago

Could you please advise on how to update study definition to point to EC tables instead?

You shouldn't have to make changes to your code. I'll let you know once the backend has been updated.

inglesp commented 4 years ago

There are a couple of options for ECDS_EC_Diagnoses

Grr, they're both less good than what we currently have. Will fix up now.

HelenCEBM commented 4 years ago

Grr, they're both less good than what we currently have. Will fix up now.

That's because TPP made the previous table! We would have had to work around for new vendors at some point anyway.

inglesp commented 4 years ago

I'll let you know once the backend has been updated.

@rohinimathur the backend has been updated, but you shouldn't need to make any changes to your code at all.

rohinimathur commented 4 years ago

@inglesp wow thanks. will just update the database end date in my analysis code. Much appreciated!