opensafely / codelist-development

Repository for discussion of OpenSAFELY codelists
7 stars 3 forks source link

Cancers #155

Open ciaranmci opened 2 years ago

ciaranmci commented 2 years ago

Here is a draft codelist indicating a diagnosis of cancer, including lung and haematological cancer, for the surg-covid-safely project: https://www.opencodelists.org/codelist/user/jkua/cancer/1d9bf8ff/

It merge of three OpenSafely codelists entitled "Cancer excluding lung and haematological", "Haematological Cancer", and "Lung Cancer".

Feedback welcome

ciaranmci commented 2 years ago

This codelist has been reviewed and approved by a clinical team consisting of @JECutting, @alwynkotze, @dmcguckin, and @JKPhoenix.

ciaranmci commented 2 years ago

We've found that we get unusually few patients returned when using this codelist (only 10s of patients). The codelist is a merger of three CTV3 OpenSAFELY codelists ("Cancer excluding lung and haematological", "Haematological Cancer", and "Lung Cancer").

I've made a new codelist by merging the SNOMED-CT versions ("Cancer excluding lung and haematological (SNOMED)", "Haematological Cancer (SNOMED)", and "Lung Cancer (SNOMED)"). This returns many more patients (100s of thousands patients).

Any ideas why the difference exists?

Tagging @ghickman because of his work on the mapping from CTV3 to SNOMED-CT Tagging @CarolineMorton because of her original work on the codelists. Tagging @LFISHER7 because he is our project co-pilot.

brianmackenna commented 2 years ago

@ciaranmci thanks for flagging this. Would you be able to share a link to the code to generate both the CTV3 and Snomed counts please to assist with the investigation?

HelenCEBM commented 2 years ago

Hi Ciaran, we use these cancer CTV3 codelists elsewhere (e.g weekly vaccine reports) and they produce hundreds of thousands of matches, so perhaps something went wrong with how they were implemented here...?

LFISHER7 commented 2 years ago

Looking at the codelist specification here, it appears the problem may have been that the codelist was wrongly specified as SNOMED. This results in the wrong table being queried but @HelenCEBM pointed out that this may still return a small number of unconverted/local ctv3 codes. Are you able to test whether changing that gives you similar counts to the SNOMED counts?

ciaranmci commented 2 years ago

I think @LFISHER7 might have found the problem.

I've just ran a job that counts patients using various codelists and combinations thereof, making sure to correct for the system = argument in the codelist_from_csv() call (ID = g7f62cldmutelt4x). The basic R script is here (see check_codelist in the project.yaml).

On a good note, the CTV3 codelists are returning hundreds of thousands of patients. Also, the various ways of combining the component codelists are all coherent. As might expected though, the count of returned patients differ whether you use the CTV3 codelists or the SNOMED codelists.