opensafely / codelist-development

Repository for discussion of OpenSAFELY codelists
7 stars 4 forks source link

Codelists as `import`-able artefacts....? #49

Open pacharanero opened 3 years ago

pacharanero commented 3 years ago

Hi openSAFELY team. I'm a friend of the project and support what you are doing here.

I wondered if you've put any thought into distributing code lists as import-able artefacts in programming languages. I feels to me that this would be a way to make it easy to use and reuse the lists with maximum ease. It would also allow for versioning, version pinning and updates to be handled by the programming language's own package manager.

For example in Python I'd like to be able to import codelists like this

from opensafely import asthma, diabetes, arterial_thromboembolism ....

def is_patient_diabetic(patients_codes):
    if diabetes in patients_codes:
        return true
    else:
        return false

I realise that this might not completely align with your current use-case since you are primarily interacting with the record via SQL queries, but I wondered if you as a team had any thoughts on this, as a way to massively simplify the handling of these codes, and move from 'TRUDitional' manual text/CSV file handling to more 'mature' practices as used elsewhere in the software industry for handling technical and data artefacts.

I figured that largely these Packages could be autogenerated from some language-agnostic upstream artefact. I'm happy to offer my help in working together on this, if it is something you feel you would value. I also think we might be able to encourage other creators of 'refsets' to publish in a reusable format, if we can settle on one.

pacharanero commented 3 years ago

The reason I'm asking about this, is that I'm working on some implementations of clinical calculators in Python, which will be open source and resuable. One of the things clinical calculators need to do is convert inputs in SNOMED terms into a boolean for 'does this patient have X condition' (good example of this is QRISK3)

In view of the fact that several of the implementation errors in QRISK2 (TPP) and CHADS2VASC2 (TPP and EMIS) were in relation to the selection of the actual codes, not the calculation itself, I'm trying to come up with a reproducible, shareable, peer-reviewable way of doing these code lists. Ideally something that results in a 'standard format' which could be readily transpiled into a python package/npm package/whatever for maximum reuse. essentially these are all just textual lists with metadata. Would JSON work?

I realise you guys are at the Research end and I am at the Direct Care end but if we're both using these codes then it would be great to get a lingua franca for them into common usage. Anything we define, however arbitrary, would be better than the existing status quo....

inglesp commented 3 years ago

Hi @pacharanero, thanks for your input. We've got no immediate plans to write libraries to make codelists importable, but I can see that it'd be a nice to have.

If you're using Python, you can do something like the following:

import csv
import requests

def get_codes(codelist_id, version, column_name):
    rsp = requests.get(f"https://codelists.opensafely.org/{codelist_id}/{version}/download.csv")
    return {record[column_name] for record in csv.DictReader(rsp.text.splitlines())}

aplastic_anaemia_codes = get_codes("opensafely/aplastic-anaemia", "2020-04-24", "CTV3ID")
inglesp commented 3 years ago

Also: perhaps we could have a chat early next year to discuss our plans for OpenCodelists?

pacharanero commented 3 years ago

Also: perhaps we could have a chat early next year to discuss our plans for OpenCodelists?

Yes that would be great. I think there's huge potential to make codelists much easier to handle and reduce some of the clinical safety risk that surrounds their use. My email is marcusbaw@gmail.com, just ping me over an invite for some time in the new year, I am fairly flexible.