opensafely / tpp-sql-notebook

2 stars 0 forks source link

Investigate Read 2 -> CTV3 mapping quality #44

Open sebbacon opened 4 years ago

sebbacon commented 4 years ago

Problem: we have a bunch of validated Read v2 codes from LSHTM, which they have used against CPRD. However, TPP works with CTV3 / SNOMED which is a more granular system.

The question is, does using the official NHSD mapping to convert Read 2 -> CTV3 cover all the codes we might want? Or will there be extra work to validate a mapped version of a v3 codelines, and potentially add new codes?

Routes to answering this:

inglesp commented 4 years ago

From the doc:

Read v2

14A3 - 14A6, 14AH, 14AJ, 14AL, 14AM, 14AT, 14AW, 14S3, 33BA, 585f, 585g, 662f - 662i, 7900 - 7905, 790J%, 790K%, 790M%, 790N0, 790N1, 7920 - 7926, 79275, 7928%, 7929%, 792B0, 792C%, 792D%, 7933%, 793L%, 8L40, 8L41, 9b8B2, D4102, F391B, G1, G1yz1, G2101, G2111, G21z1, G232 - G234, G3%, G41, G41y, G41z, G501, G58%, G5y4z, G5yy9 - G5yyE, Gyu3%, H541%, P50 - P53, P542 - P544, P56z0, P57 - P59, P603, P60z1, P67, P6y0 - P6y3, P6y63, P6y64, P6yy8, P6yyA, P6z3, P7211, P7217, P731 - P735, P738, P741%, P74z6, P74z8, Q48y1, SP003, SP076, SP084, SP085, SP111, TB000, ZV421, ZV432, ZV457, ZV458, ZV45K, ZV45L, ZVu6f, ZVu6g Exclude: G341%, G37, G5810, G582

Read v3

14A5.%, 14A6.%, 14S3., 7902.%, 7905.%, 7929.%, D4102, G1..., G2101, G2111, G21z1, G232., G233., G234., G310., G36..%, G364., G365., G366., G58..%, H541., H5410, H541z, P51..%, P511.%, P52..%, P543., P544., P60z1, P67.., P6y0., P6y1., P6y3.%, P6y30, P6y63, P6y64, P6yy8, P6yyA, P7211, P7217, P731., P732., P733., P734., P738., P741.%, P74z6, P74z8, Q48y1, SP003, TB000, Ua1eH, X00tE%, X00tM%, X00tS%, X00tT%, X00tU%, X00y0<, XaLsc%, X00xo%, X00xr%, 79012%, X00xv%, X00xz<, 79321<, 7933.%, Xa3ku<, Xa3lL<, XaM9t%, X00y1%, X010U, X010V, X010e%, X010z%, X0111, X0113%, X201u, X202p%, X202r, X202u%, X203k, X777r, X77tZ, X77th%, X77tq, X77tv%, X77uI, X77uJ%, X77v0%, X77yE, X77yG%, X77yS%, X77yZ%, X77yj%, X77zE, X77zF, X77zG, X77zH, P6y2., X77zL%, X780C, X780D, X780E%, X780H%, X780M<, X780S, X780V, XC0MC, XE1KG, XE1KK, XE1KQ%, XE2Qh%, XE2uV%, XE2ur%, XE2vz, XM0rN, XM0rO, G41..%, XM1Le, XM1Qn, Xa0Kw, XaBEB, XaBL4, XaC1g, XaI9b, XaIpn, XaJ98, XaJ99, XaJIU, XaJJv, XaLen, XaLeo, XaLfL%, XaLfW, XaLfX, XaMK7, XaPr5%, XaPr7%, XaQk7%, XaVvs, XaX1p, XaYYq, XaZKd, ZV421, ZV432, ZV457, ZV458, ZVu6f, ZVu6g Exclude: G5810, G582., G5y4., G5y40, 79321, X00xz, X00y0, Xa3ku, Xa3lL, X780M, G341.%, X200c, Xa07j%

inglesp commented 4 years ago

I'll build a tool this morning to convert a list of v2 codes into a list of v3 codes, and then we can compare with the values here.

hmcd commented 4 years ago

Hello, for those in the PPV legacy specification probably best to use the actual v3 lists from PRIMIS if I can get them from Dai Evans- the mapping was reviewed code by code rather than just put through the nhsd mapping tool, so any oddities will have been weeded out. They won't have these for some though (neurological conditions, asthma, the RA/SLE/psoriasis/IBD codes I sent last night) so might be sensible to start mapping those while I contact Dai?

CarolineMorton commented 4 years ago

Hi @hmcd thanks for the reply and welcome to Github! It would be great if we could get the mapping from Dai Evans as it sounds like that will save a lot of time. Do you need anything from us our end?

sebbacon commented 4 years ago

Thanks @hmcd, that's really useful.

I understand in the specific case of PPV there's some debate about if we're actually going to include that as a covariate (right @CarolineMorton?), but it would be interesting nonetheless, and probably useful down the line.

Do you know anything about what "any oddities will have been weeded out" means in practice? Who is did this in the PPV case? Is there someone we can talk to who can impart some expertise on this -- sounds like it's you? :)

For example, will they have started with the mapping tool? Did they just remove oddities that appeared, or did they also add new codes?

How should we proceed - like you're recommending we start with mapping asthma to CTV3 and then manually reviewing? What does that process look like? (Note we already have QoF clusters in CTV3 for asthma, copd, etc -- should we prefer the LHSTM ones over those?)

Sorry for the deluge of questions!

hmcd commented 4 years ago

Hello, I'll email Dai to ask for the codes he has (and in the meantime will summarise the rules he gave me for expanding the lists in that word document). For pneumococcal vaccination status itself we agreed not to include it (would just be introducing a weird adjustment for health behaviour) - but the specification included quite a few of the relevant underlying health conditions.

I don't have any practical experience of mapping CTV3 codes - I didn't do the mapping for the PPV spec, PRIMIS did. Would someone in TPP be the person to ask for advice on the best approach to mapping? Chris Bates might be able to suggest someone?

sebbacon commented 4 years ago

Thanks. So someone came up with the codes in Read 2, and then PRIMIS / Dai mapped to CTV3? Or did PRIMIS define all the code lists?

Sounds like any kind of summary of the workflow Dai used will be useful - can you post it in this issue when you have it? Thanks!

hmcd commented 4 years ago

I did the Read v2, Dai and colleagues reviewed, we agreed a Read 2 set, then PRIMIS mapped to CTV3. I can ask about advice on mapping but can't promise a response - is there anyone at TPP with expertise in CTV3 codes? I can ask in our EHR group, too if that'd be helpful.

sebbacon commented 4 years ago

Great. No harm in asking as many people as possible! I will ask Chris at TPP.

hmcd commented 4 years ago

@sebbacon, sounds like @chris-tpp has mapping the v2 codelists to v3covered!

The ones from the PPV legacy specification had detailed clinical review and I think the following v3 lists could be used from that document without remapping to compare:

I wouldn't use the PPV specification chronic kidney disease list - we discussed that we'd prefer to use eGFR and a dialysis code list for CKD, so it'd be good to translate just the dialysis code list instead.

I have some notes from Dai Evans at PRIMIS about what stubs to expand and not to expand so will put those here, for interpreting the PPV spec.

I'm looking forward to learning more about CTV3 vs v2!

sebbacon commented 4 years ago

Great - though we should have it covered by TPP, the more info we can gather at this stage about the process from all the sources, the better (as we are aiming to write generalised software to help with this in the future).

sebbacon commented 4 years ago

To summarise call with @chris-tpp:

Thanks @chris-tpp!

@CarolineMorton @alexwalkercebm which code list shall we ask him to map?

CarolineMorton commented 4 years ago

yes great. can we look at chronic heart disease and non asthma respiratory read codes? #21 #7

read codes are in the git issues

hmcd commented 4 years ago

@inglesp @CarolineMorton @alexwalkercebm @sebbacon In the PPV specification, the % term means include the code and all its descendants.

inglesp commented 4 years ago

In the PPV specification, the % term means include the code and all its descendants.

@hmcd this is really helpful, thanks.

You don't know what < means, do you? And do you know where any of this is documented?