opensafely / tpp-sql-notebook

2 stars 0 forks source link

TPP process for mapping Read 2 into Read 3 CTV3 codes #59

Open CarolineMorton opened 4 years ago

CarolineMorton commented 4 years ago

See issue #44

From TPP:

  1. Split the CPRD code tables into valid Read2 codes and terms. Read 2 codes are 5-byte alpha-numeric strings (e.g. 'G33..') and Read2 terms are 2-byte numeric strings (e.g. '00', '11'). This allows you to sanity check against the Read2 tables that you have valid entries. You also need them split this way as well for mapping. The entries in these tables were in general 7-byte string (5 for the code, 2 for the term. The issues encountered were: a) Missing terms - there were some entries that did not have the final 2 bytes. In this case, we put on the preferred term '00' as a default. b) Short codes - there were some codes with fewer than 5 bytes. These were just missing the trailing 'dots'. This is actually fairly common as it's the way Emis used to store the codes back in the days when storage was at a premium. If you find any that are short, pad them out to 5-bytes with .....'s
  2. Now do the first mapping, using the NHS data migration workbench V2 -> CTV3 mapping table. For this first pass, map on BOTH the code AND the term. Throughout this process, I'd put a column on your table to record the source of the result CTV3 code. You'll see it in the file we've sent (e.g. CTV3Map_Code_And_Term).
  3. Next map up using the NHS data migration workbench V2 -> CTV3 mapping table on the code ONLY (i.e. ignore the term). This runs the risk of getting an incorrect code because of a thing called synonym impurity - you have a single concept where individual terms (which should just be just different human-readable ways of expressing the same clinical concept) actually express different things. Take a look at Read2 'N245.' and all will become clear. However, this process is going to manually check the resulting codes at the end, for lots of reasons, so we have mitigation against this problem.
  4. Now use the CTV3 code hierarchy to get all the children of the codes mapped up so far. Add these into the table if they aren't already there. So if we have a high-level code for 'Heart failure" that we've established through mapping, then add all the children codes of this high level code in CTV3.
  5. Next - QOF clusters. Go through the distinct list of QOF clusters and choose any that are related to diagnostic codes for the condition of interest. Get all the CTV3 codes these link to (probably by back mapping from SNOMED - just ignore the mapping warnings from NHSD). Add these to the table.
  6. Now choose the highest-level SNOMED CT code for the condition. Chose any interesting condition the list needs to contain (e.g. heart failure), going to the NHS SNOMED CT browser online, searching for the condition, finding the best code that has (disorder) as the semantic tag on its description, and stepping up the hierarchy until you reach the highest-level parent (e.g. 56265001). Now use the SNOMED hierarchy to get ALL the children in SNOMED of this code (the "IS_A" relationships). Back map this to CTV3 and add the codes.

@chris-tpp

I have been going through this with @sebbacon and @amirmehrkar. Thank you v much for sending over the CVD list yesterday. I have some questions for #7 CVD that I will put on that issue.

More general questions: 1) How are you choosing the QOF clusters and is it possible to get a list of these used for audit? 2) How are you choosing clinical terms to search in SNOWMED and is there a way to get clinical input into this? (for example myself or @amirmehrkar might be able to assist).

CarolineMorton commented 4 years ago

Discussed with @chris-tpp and @alexwalkercebm

Process can be streamlined with clinical input from us. We need to provide to TPP: 1) Read Code V2 list of conditions 2) QOF Cluster codes for relevant conditions 3) High level SNOWMED codes that are found in the Snowmed CT browser using key words

This will generate the CTV3 code list which will then be manually checked by a clinician.