Open HelenCEBM opened 2 months ago
From @Jongmassey
All four are there in the UK clinical extensions to snomed: General practice summary data sharing exclusion for gender related issues simple reference set 999004351000000109 General practice summary data sharing exclusion for assisted fertilisation simple reference set 999004371000000100 General practice summary data sharing exclusion for termination of pregnancy simple reference set 999004361000000107 General practice summary data sharing exclusion for sexually transmitted disease simple reference set 999004381000000103 and it's possible to get all the member codes for each. It's just a great big download from TRUD every time and AFAICT there's not a convenient mechanism already in place in OpenCodelists to do this automatically.
rough prototype
import csv
from collections import defaultdict
from pathlib import Path
description_file = next(
Path("Full/Terminology/").glob("sct2_Description_UKCRFull*.txt")
)
exclusion_refset_pattern = "General practice summary data sharing exclusion"
with description_file.open("r") as f:
reader = csv.DictReader(f, delimiter="\t")
exclusion_refset_concepts = {
r["conceptId"]: r["term"]
for r in reader
if exclusion_refset_pattern in r["term"]
}
excluded_concepts = defaultdict(list)
content_file = next(
Path("Full/Refset/Content/").glob("der2_Refset_SimpleUKCRFull*.txt")
)
with content_file.open("r") as f:
reader = csv.DictReader(f, delimiter="\t")
for r in reader:
for exclusion_conceptId in exclusion_refset_concepts:
if r["refsetId"] == exclusion_conceptId:
excluded_concepts[exclusion_conceptId].append(
{"conceptId": r["referencedComponentId"]}
)
for conceptId, term in exclusion_refset_concepts.items():
with open(f"{conceptId}_{term.replace(' ','-')}.csv", "w") as f:
writer = csv.DictWriter(f, fieldnames=["conceptId"])
writer.writeheader()
writer.writerows(excluded_concepts[conceptId])
using the SnomedCT_UKClinicalRefsetsRF2_PRODUCTION...
folder in the latest from SNOMED CT UK Clinical Edition, RF2: Full, Snapshot & Delta release from TRUD
There are approximately 6 legally restricted code groups that cannot be returned in OpenSAFELY data (referenced in the DPIA), e.g. for termination of pregnancy. However, this is not yet well documented for users, and it's easy to create codelists that contain these codes and run a query, without any warning that some codes will not be matched against any results. These might produce a surprise zero-matches result that is noticed but will usually fail silently, i.e. produce an incomplete result that can go unnoticed.
If an application clearly depends on the use of these codes it will be picked up at that stage, but it several groups have tried to use these codes as a part of a wider study without realising they are restricted.
Possible solutions:
Load the restricted code groups into OpenCodelists and allow users to diff them with each of their codelists if they think they might be using some of them.
As (1), but add: give users a warning in OpenCodelists when they create a codelist that has any matches with a legally restricted codelist.
Give users a warning in the OpenSAFELY interface e.g. when updating codelists - to give a warning about matches with restricted codes.
Notes
For all of the solutions, the restricted codelists that we create on OC will require regular (automated) checking to make sure they're kept up to date.