Proof of Concept for new Covariates library

@CarolineMorton:

we have been given a lot of read codes or med codes in the form of csv files (or excel) see issues... I wondered is there scope to point to a csv to read in the codes, as well as a dataframe read in option. just to add flexibility?

This is something we need to discuss with @inglesp this morning. The goal is for us to package what @evansd has written into a standalone module; add tests; and integrate with some kind of "codelist" module (TBC).

The ultimate vision is that our study cohort definition file would look something like this:

from peters_codelist_thing import codelist
import daves_cohort_thing as dct

cvd_meds = codelist("qof:cvd_meds", coding_system="snomed", version="1.2")
chd_codes = codelist("lshtm:chd_clinical_codes", coding_system="ctv3", version="1.5")
smoking_codes = codelist(
    "smoking_clinical_codes", coding_system="ctv3", version="latest"
)

model_input_definition = {
    "cvd_meds": dct.patients_with_these_medications("cvd_meds", snomed_codes=cvd_meds),
    "chd_code": dct.patients_with_these_clinical_events(ctv3_codes=chd_codes),
    "age_and_sex": dct.patients_with_age_and_sex("today"),
    "smoking_status": dct.patients_with_these_clinical_events(
        ctv3_codes=smoking_codes, min_date="2015-01-01", max_date="2020-03-31"
    ),
}

And then our workflow would do something like

from definition import model_input_definition
import daves_cohort_thing as dct

dct.generate_model_input_definition(model_input_definition)

(This as a straw man only; the point is something like the above is all a statistician would need to write to generate either dummy data for playing with, or real data to run the real model on)

opensafely / tpp-sql-notebook

Proof of Concept for new Covariates library #53