opensafely / emis-qa

MIT License
0 stars 0 forks source link

Overview of how data is currently organised #2

Open sebbacon opened 3 years ago

sebbacon commented 3 years ago

Current context: we have three studies we are running regularly against EMIS:

  1. PRIMIS codelist prevalence
  2. Vaccination updake
  3. Long covid codes

We are currently only interested in currently registered patients (the underlying data includes deregistered patients). The above studies define the cohort as follows (but with additional constraints on, for example, age):

    population=patients.satisfying("registered"),

    registered=patients.registered_as_of(
        "2020-03-31", 
        return_expectations={
            "incidence": 0.95,
        },
    ),

Which, in EMIS, executes SQL as follows:

SELECT
    patient_no_duplicates.registration_id, -- this is a simple view to de-dupe the small number of duplicate registration ids
    hashed_organisation,
    1 AS value
FROM patient_no_duplicates
WHERE
   registered_date <= DATE('2021-03-31')
AND (registration_end_date > DATE('2021-03-31') -- check this is consistently valued for all current registrations
OR registration_end_date -- date of deregistration
   IS NULL -- they have not deregistered

Type 1 opt-outs

EMIS have been incorrectly excluding opt-out patients from the data. We expect this to land in our data in the next day or so. This is expected to be ~900k patients

Static organisations list

EMIS have told us the organisation list is currently out of date - it's a static list of approx 3920 practices, created in mid-2020, and was short of a few practices even then. I understand that's line 7 of the following screenshot (from EMIS' internal repo - SQL that generates our table(s)):

image

They are currently doing work to add around 50 missing practices and to make this list dynamic rather than static.

Other inclusion criteria

There are various other criteria applied to the data and which may be revisited. Nas will supply us with descriptions of currently and proposed.