opensafely-core / cohort-extractor

Cohort extractor tool which can generate dummy data, or real data against OpenSAFELY-compliant research databases
Other
38 stars 13 forks source link

EMIS: expose vaccine- and allergy-related information from separate tables #531

Open HelenCEBM opened 3 years ago

HelenCEBM commented 3 years ago

Codes related to vaccines (e.g. refusal) are stored in the immunisations table, not the observations table. Similarly allergies are stored in the allergy table (even when related to vaccines). Therefore they are not currently captured by the study definition in the way we extract this information in TPP (with_these_clnical_events)

inglesp commented 3 years ago

A solution would be to make with_these_clinical_events query the union of observations, allergies, and immunisations. I had assumed that this might perform terribly, but both the following take a similar amount of time, between 5s and 15s. (I have no idea whether there's any caching happening.)

select registration_id, hashed_organisation, snomed_concept_id
from (
    -- observation_all_orgs_v2
    select registration_id, hashed_organisation, snomed_concept_id, effective_date from observation_all_orgs_v2
    union all
    select registration_id, hashed_organisation, snomed_concept_id, effective_date from immunisation_all_orgs_v2
    union all
    select registration_id, hashed_organisation, snomed_concept_id, effective_date from allergy_all_orgs_v2
) where snomed_concept_id in (901701000000101)
select registration_id, hashed_organisation, snomed_concept_id
from observation_all_orgs_v2
where snomed_concept_id in (901701000000101)

@evansd should we do the same in TPP land?

sebbacon commented 3 years ago

I agree with this solution in principle. However I worry there is some ontological use of the different tables that we would be obscuring. I suppose in practice it boils down to if the same SNOMED code might appear in more than one underlying table, because if so, what then?

HelenCEBM commented 3 years ago

From the initial checks that @inglesp performed it seems that SNOMED codes were strictly in just one table (e.g. "allergy to COVID-19 vaccine" appears only in the Allergy table, never in the main table or the vaccine table). I can't imagine a problem with combining them, except theoretically that for generic codes e.g. "did not attend" you could assume that a vaccine appointment was not attended based on it appearing in the vaccine table, for example, and we would be losing this information. However, it may be that generic codes are not stored there anyway, and such information would not discernible this way in TPP, as far as I know.

We could check a bit more thoroughly e.g. find the top X codes in allergy and immunisations and see if they are present in the other tables (and if so, what are they)

HelenCEBM commented 3 years ago

To update/summarise:

Some options:

sebbacon commented 3 years ago

if replicated in TPP this could be more complex given the vaccination table uses different codes and descriptions

We should check with TPP that they don't already have a SNOMED mapping they could make available. If they do, then the combined option seems the best one?

So I think this issue is blocked on that discussion.