opensafely / codelist-development

Repository for discussion of OpenSAFELY codelists
7 stars 3 forks source link

*DISEASE* Cancer #10

Open CarolineMorton opened 4 years ago

CarolineMorton commented 4 years ago

LSHTM definition:

Malignant cancer code with a diagnosis within the previous year OR a code for chemotherapy or radiotherapy within the previous year (6 months if code indicated end of treatment)

Codes published by Helen Strongman - i will contact to get hold of these.

Potential problems from Helen MacDonald: Not great in primary care record; Are the time periods for Chemo/Radiotherapy too long for this group

CarolineMorton commented 4 years ago

Subset needed for Leukaemia/Lymphoma or bone marrow transplant

bonemarrow_stemcell_July18.xlsx chemo_radiotherapy_end_Jul18.xlsx chemo_radiotherapy_not_end_Jul18.xlsx leukaemia_lymphoma_otherhaem_nohist_Jul18.xlsx

CarolineMorton commented 4 years ago

DRAFT

DEFINITION: 2 columns for each of the codelists below: 1) a binary variable denoting the presence of one of the codes at any point in the patient record. 2) the earliest date of such a code.

Example Output: patient_id cancer_bin code date
32 1 breast cancer NOS 7-4-2018
45 1 lung cancer - squamous cell 8-2-2019

POTENTIAL BIASES:

CLINICAL SIGN OFF & DATE:

EPIDEMIOLOGY SIGN OFF & DATE:

SHARED WITH WIDER TEAM: Yes/No

FINAL SIGN OFF DATE (and apply label)

krishnanbhaskaran commented 4 years ago

STEP 2 QoF Cluster Codes LUNG CANCER/HAEMATOLOGICAL MALIGNANCY/OTHER CANCER So.. qof cluster codes for cancer - here's the thing. There is a cluster called "CAN" which is "codes for relevant malignancies" in the spreadsheet. This comprises 2215 codes. However, this will include lung and haematological, and not clear to me how we'd separate these out.

In which case wondering if we need to go with Steps 1 and 3 only for the cancers? Thoughts?

Here they are - all codes under cluster CAN (Codes for relevant malignancies) = 2215, plus a further 45 under cluster DRCAN1 (Cancer diagnosis) plus a further 5 found using string search for "MALIGNANT NEOP" OR "MALIGNANT TUM" and manual check of result..
NOTE!! this list combines lung cancer, haematological, other, so they'll need sorting out into the three categories for our 3 cancer variables at some stage: qofcluster_CANCER_ALL_NB_INCLUDES_LUNG_HAEM.xlsx

BONE MARROW TRANSPLANT The following from a string search for "MARROW TR" OR "CELL TR" in the qof spreadsheet and manual exclusions/de-duplication. qofcluster_BONEMARROWTRANSPLANTcodes.xlsx

CHEMOTHERAPY The following from a string search for "CHEMO" in the qof spreadsheet and manual exclusions/de-duplcation. qofcluster_CHEMOTHERAPYcodes.xlsx

CarolineMorton commented 4 years ago

We can't separate them out in QOF clusters but we will be able to remove overlapping codes at the final stage so i would suggest including and then we can sort later

krishnanbhaskaran commented 4 years ago

OK uploaded them all for now

krishnanbhaskaran commented 4 years ago

STEP 3 Snomed LUNG CANCER snomed-lung_cancer.xlsx

HAEMATOLOGICAL MALIGNANCY snomed-haem_maligs.xlsx

OTHER CANCERS NOTE THIS IS ACTUALLY ALL CANCERS INCLUDING LUNG AND HAEMATOLOGICAL, so those ones will need to be removed from the final code list (using the dedicated lung/haem codelists) snomed-ALLcancers_NB_INCLUDES_LUNG_AND_HAEMATOLOGICAL.xlsx

BONE MARROW TRANSPLANT snomed-bone_marrow_transplant.xlsx

CHEMO/RADIOTHERAPY snomed-chemotherapy_radiotherapy.xlsx

alexwalkerepi commented 4 years ago

Probably not a thing for the first round, but while looking at these codelists it's occurred to me that we might want to pull out codes like B570. Metastasis to lung separately. I'm including them in the Other cancers category for now.

alexwalkerepi commented 4 years ago

@krishnanbhaskaran do we want to include non-melanoma skin cancers in the other group? When I've previously made cancer code lists, I've either excluded them or had them as a separate category due being relatively non-severe. There's some codes in the QoF list at least currently.

alexwalkerepi commented 4 years ago

This is the QoF cluster list categorised into:

qofcluster-catgorised-lung-haem-other.xlsx

alexwalkerepi commented 4 years ago

LUNG CANCER To be converted to read 3

  1. LSHTM codes in read 2: lungcancerREAD2.xlsx
  2. QoF cluster codes: qof-lung.xlsx
  3. Snomed: snomed-lung_cancer.xlsx
alexwalkerepi commented 4 years ago

HAEMATOLOGICAL CANCER To be converted to read 3

  1. LSHTM codes in read 2: leukaemia_lymphoma_otherhaem_nohist_Jul18.xlsx
  2. QoF cluster codes: qof-haematological.xlsx
  3. Snomed: snomed-haem_maligs.xlsx
alexwalkerepi commented 4 years ago

OTHER CANCER To be converted to read 3

  1. LSHTM codes in read 2: allcancersEXCEPT_lung_haem.xlsx
  2. QoF cluster codes: qof-other-cancer.xlsx
  3. Snomed: snomed-ALLcancers_NB_INCLUDES_LUNG_AND_HAEMATOLOGICAL.xlsx
    • It's probably easier to leave this as a high level snomed code, I'll take the read 3 converted list and use the lung and haem lists to exclude them.
alexwalkerepi commented 4 years ago

BONE MARROW TRANSPLANT To be converted to read 3

  1. LSHTM codes in read 2: bonemarrow_stemcell_July18.xlsx
  2. QoF cluster codes: qofcluster_BONEMARROWTRANSPLANTcodes.xlsx
  3. Snomed: snomed-bone_marrow_transplant.xlsx
alexwalkerepi commented 4 years ago

CHEMO/RADIOTHERAPY To be converted to read 3

  1. LSHTM codes in read 2: chemo_radiotherapy_not_end_Jul18.xlsx
  2. QoF cluster codes: qofcluster_CHEMOTHERAPYcodes.xlsx
  3. Snomed: snomed-chemotherapy_radiotherapy.xlsx
krishnanbhaskaran commented 4 years ago

RE: @krishnanbhaskaran do we want to include non-melanoma skin cancers in the other group? That's a good point I think we should exclude. In the main other cancer Read2 list uploaded above these can be excluded by using the icd column...

alexwalkerepi commented 4 years ago

Thanks Krishnan, I'll leave them in and exclude them manually in converted read 3 list, as they'll largely be included from the snomed and qof lists anyway.

alexwalkerepi commented 4 years ago

FINAL

DEFINITION: For each codelist below,separately: The earliest date of such a code for each patient. (maybe later, the earliest occurring code description)

Example Output: patient_id code date
32 breast cancer NOS 7-4-2018
45 lung cancer - squamous cell 8-2-2019

CODE LISTS: LUNG CANCER Reviewed CTV3 list Lung_Cancer_CTV3_Reviewed.xlsx

HAEMATOLOGICAL CANCER Reviewed CTV3 list Haematological_Cancer_CTV3_Final.xlsx

OTHER CANCER Reviewed CTV3 list other_cancers_final.xlsx Notable exclusions

BONE MARROW TRANSPLANT Reviewed CTV3 list BoneMarrowTransplant_CTV3_Reviewed.xlsx

CHEMO/RADIOTHERAPY Reviewed CTV3 list Chemo_Radiotherapy_CTV3_Reviewed.xlsx

POTENTIAL BIASES:

CLINICAL SIGN OFF & DATE: Caroline Morton (@CarolineMorton) 15/4/2020 17:40

EPIDEMIOLOGY SIGN OFF & DATE: Alex Walker (@alexwalkercebm) 11/4/2020 18:24

SHARED WITH WIDER TEAM: Yes

FINAL SIGN OFF DATE: 15/4/2020 17:44

krishnanbhaskaran commented 4 years ago

I checked the lists as best I could (bearing in mind thousands of codes here, and am not clinical!).

Lung, transplant, chemo/radio I didn't spot any problems.

Haem and other cancer lists both seemed to include many codes for tumours/tumour types which are not clearly malignant ("D" codes or ambiguous between "C"/"D" in ICD 10 terms). I don't think these should be included - cancer epi nearly always restricts to definite malignancies (that would be in ICD-10 chapter 2/"C").

Below the haem list with queries marked Haematological_Cancer_CTV3_Reviewed_KB.xlsx

Below the query rows from the other cancers list othercancers_questionmarks.xlsx

krishnanbhaskaran commented 4 years ago

For the record, below the procedure for checking the ~3000 other cancers list - basically many cleared based on presence of key text (clinical check recommended in case I've been too sweeping), remainder checked manually vs ICD browser, Google, Wiki etc:

gen ok=1 if strpos(upper(ctv3pref), "MALIG")>0

replace ok=1 if strpos(upper(ctv3pref), "CARCINOM")>0

replace ok=1 if strpos(upper(ctv3pref), "METAST")>0

replace ok=1 if strpos(upper(ctv3pref), "SARCOMA")>0

replace ok=1 if strpos(upper(ctv3pref), "CA ")>0

replace ok=1 if strpos(upper(ctv3pref), "CANCER")>0

replace ok=1 if strpos(upper(ctv3pref), "MELANOMA")>0

replace ok=1 if strpos(upper(ctv3pref), "CARCINOID")>0

replace ok=1 if strpos(upper(ctv3pref), "BLASTOMA")>0

replace ok=1 if include==0 /already flagged by Oxford/

gen flag = 1 if strpos(upper(ctv3pref), "ADENOMA")>0 & include==1 gen comment = "benign?" if strpos(upper(ctv3pref), "ADENOMA")>0 & include==1

*MANUAL CHECKING OF REMAINING ~200 browse if !(ok==1|flag==1)

alexwalkerepi commented 4 years ago

Thanks Krishnan, I agree, so have removed the queried codes in the lists in the definition above.

krishnanbhaskaran commented 4 years ago

Great! Just to check, will your include flag be taken into account in the onward processing or do you need a final list with the included only?

alexwalkerepi commented 4 years ago

We need to convert it to a csv for use in the data extraction, so we'll remove the non-included ones then. I think it's good to include the final list along with the exclusions/reasons here to document the process.

krishnanbhaskaran commented 4 years ago

Great. Yes def good to keep a record of the exclusion decisions.