Open wjchulme opened 1 year ago
Maybe there's a reliable API for the term browser, or NHS data dictionary, that could be used to facilitate this? More research needed!
As it has been described to me the NHS Terminology Server should do this but we will need to investigate if it can meet our needs in practice
Yes, we would have to request a system-to-system account to use it. Alternatively we could just download the whole shebang from TRUD every time they update it and query it locally.
Refsets would be a great starting point, but there's loads more of useful things in SNOMED that we could make use of both for clinical code and medication code lists.
Roughly-speaking, a reference set (or refset) is a subset of codes in a clinical coding system (eg SNOMED). They define a set of related concepts in a given context and are often used to restrict the set of possible values that can be used for a particular variable. For example, the codes for "patient diagnosis" and "discharge diagnosis" in SUS's emergency care dataset are restricted to the Emergency care diagnosis (991411000000109) SNOMED refset, containing around 1000 codes. You can view this refset on the NHS term browser (click "members" to see each individual snomed code in the refset).
Refsets are basically codelists. As codelists, it makes sense to track them on opencodelists to facilitate integration into opensafely workflows. There's nothing stopping us adding a bunch of refsets to opencodelists now, but there are a few improvements to opencodelists that would make refset capture more useful:
[trauma, cardiovascular, respiratory, ...]
. Categorising a refset in this way is equivalent to developing a collection of codelists, one for each category. Some researchers already do this, but it's messy (for hopefully obvious reasons). It would be better to enable categorisation as an attribute of the refset itself, including multiple categorisations on the same refset. Categorisation is already possible (eg we do it for ethnicity) but the tooling is basic -- users need to add a category column to the codelist data file and upload that file as a new codelist. Multiple categorisations aren't possible (I think).Validation. If we had up-to-date refsets in opencodelists, we'd be able to:
ecds.discharge_diagnosis.is_in([1,2])
should fail if the refset fordischarge_diagnosis
is[2,3,4]
).This is an opensafely backend thing, not a opencodelists thing, and though it's surely been discussed before I'm mentioning here so it's not lost.
This is just a starting point, and probably needs to be split into multiple issues if/when things get going!