opensafely-core / cohort-extractor

Cohort extractor tool which can generate dummy data, or real data against OpenSAFELY-compliant research databases
Other
38 stars 13 forks source link

Set up minimal functionality for Databricks PoC #570

Open sebbacon opened 3 years ago

sebbacon commented 3 years ago

We will reproduce one of BSA's sentinel measures that links HES and individual prescribing data. Specifically we propose measure GIB01 ("Patients aged 65+ admitted to hospital with a gastro-intestinal bleed and currently prescribed a NSAID and not concurrently prescribed a gastro-protective medicine"). The outputs in ePACT currently look something like this:

image(5)

The attached is the proposed HES schema. Notes from the supplier:

OPEN_SAFELY_SCHEMA_HES.xlsx

For the PoC we will only want or need to query a subset of all this data.

brianmackenna commented 3 years ago

Meds Safety Indicator

Numerator Patients from denominator with A primary diagnosis (ICD10 code) has been recorded as a Gastro for

Intestinal bleed and Admission episode recorded as complete and Emergency admission recorded

Denominator Patients aged 65 or over in the prescription month prescribed a nonsteroidal anti-Inflammatory drug (NSAID) without a gastro protective medicine

Lets take each constituent part one by one

Denominator Underlying data: Reading the medicines reference data all fields are available.

Elements of denominator Age > 65 This is available in the medication table but tbc about where we take this from.

prescribed a medication The medication table contain medicines that have been "prescribed" and "dispensed". When a medicine is not dispensed I think Prescribed in this context is slightly misleading. More accurately it is only prescriptions that have been prescribed and presented for dispensing and then not dispensed but forwarded to the BSA.

NSAID or gastroprotective agent The medication table contains both dm+d and pseudoBNF code identifiers. We have codelists for these already.

What exists in OpenSAFELY framework already

with_these_medications essentially queries this dataset already at point of creation (i.e. medication issue). This deployment would be slightly different data source as it is medicines dispensed. Although both medication issue and dispensed have the same core attributes they do start to diverge on fields when getting into detail.

Numerator Underlying data: See table above in xlsx. This does not appear to match directly with the Hospital Episode Statistics Data Dictionary - NHS Digital.

For the BSA medication safety indicator we require the Hospital Episodes Statistics Admitted Patient Care (HESAPC). The NHSD schema has other HES datasets that may additionally be useful for creating this indicator however for purposes of POC discussion I'll restrict to HES APC.

primary diagnosis (ICD10 code) has been recorded When building I think we should also make functionality available to see secondary/other diagnoses to construct this indicator. Reading the HES dictionary primary diagnosis field is DIAG_NN. However this does not appear to be in the data schema directly. There are other relevant fields

Provides a concatenated string of all diagnoses from the record at a 4 character level, separated by a comma with no spaces. This enables the user to search across the full list of diagnoses to look at mentions and pairs of diagnosis codes

Can we assume that n=1 is primary? Is n=2 secondary or after primary are they all equal?

Additionally in the scheme there is mention of a dataset APC DIAG which contain DIAG_CODE and DIAG_SEQ. Is it possible that DIAG_NN has been put seperately here and APC and APC DIAG should be used together? This fits with our understanding that there is no single version of HES and no single detailed description of HES v SUS difference.

(ICD10 code) has been recorded as a Gastro for Intestinal bleed Codelist started here and note they are 4 digit codes.

Admission episode recorded as complete There are various fields which could be used here all starting with Finished_xxx_episode. I am not familiar with these terms, we can check with NHS Digital for definition or can we tell from this info from dashboard specification

The indicators count unique admissions, reported quarterly based on the admission date, where the patients are prescribed medicines specific to the indicator. Unique patients with multiple admissions on the same day are counted once and unique patients with multiple admissions within the quarter are counted as multiple admissions. The figure presents admissions due to the cause specified in each indicator

Emergency admission recorded Potentially this is defined in Admission episode recorded as complete as there is a finished admission episode for emergencies FAE_EMERGENCY. There is another field in the HES APC related to admission method ADMIMETH where emergencies can be defined. We probably need to confirm with people who make the dashboard but could be interesting exercise to check both methods or use combination.

What exists in OpenSAFELY framework already admitted_to_hospital is available for querying SUS data which can match primary diagnosis, any other diagnosis, admission method.

In our TPP deployment this queries Hospital admission, from SUS Admitted Patient Care Data (APCS) which is obtained from NHS England. Although probably derived from the same source data as the NHS Digital HES it has different columns and fields.

Thoughts/summary

  1. We need to confirm with NHSD if medicines schema I used is same as their internal database
  2. We need some clarity on primary/secondary diagnosis table from NHS Digital though others in team with HES expertise can probably shed more light on it
  3. We need to clarify some very specific definitions with team who manage the dashboard specification
  4. A thought on OpenSAFELY design - do we rename libraries in line with underlying data e.g. should we make something called with_these_medications_dispensed or do we leave it as with_these_medications and it's up to folk to understand deeply the undelying data? Similar with patients.admitted_to_hospital
inglesp commented 3 years ago

We want patients with spells with:

So it looks like we're interested in the following fields:

(There is also the fae_emergency field, which we could use to combine fae and admimeth, but I think using that loses us some flexibility.)

inglesp commented 3 years ago

Outstanding questions:

brianmackenna commented 3 years ago

(updated)

No The data are all in a single table, not split by year.

The data are all in a single table, not split by year. No

sebbacon commented 3 years ago

To summarise - please correct mistakes @brianmackenna @inglesp; and cc @StevenMaude and @madwort who will be picking this up between them soon.

The goal is to have code that can (in theory) run without modification in their production environment, and which will produce the charts we need

iaindillingham commented 3 years ago

@LFISHER7 has generalised the SRO code into sro-template. If you're looking for dashboards, then start here.