opensafely / codelist-development

Repository for discussion of OpenSAFELY codelists
7 stars 4 forks source link

*PATIENT*: ethnicity #7

Closed sebbacon closed 3 years ago

sebbacon commented 4 years ago

Useful for calculating eGFR and possibly when investigating ACE/ARB effects, but

if we can get it quickly great, but if it will delay things substantially I think we could live without it

And there are also data quality issues (to be described here... TPP will update us)

sebbacon commented 4 years ago

According to Chris, this is a coded event already, but they will pull it into the patient table too.

CarolineMorton commented 4 years ago

We need to check the reliability of this - probably against census broad groups as historically black ethnicity has been poorly coded in CPRD, and may well be the same in TPP

CarolineMorton commented 4 years ago

@chris-tpp can we clarify what the data looks like?

chris-tpp commented 4 years ago

We've split off another table with all the ethnicity codes from coded events (we haven't removed the codes from coded events, just created this table almost as a quick view - easier to deal with). It's based on the three high-level parent codes in CTV3 that have ethnicity codes underneath them. These are the codes people are restricted to using when recording ethnicity through the usual routes in the system.

CarolineMorton commented 4 years ago

@hmcd Do you have any algorithm or protocol to assign ethnicity (5/6 groups) based on codes? We are likely to have more granular ethnicity data and will need to categorize?

hmcd commented 4 years ago

We use Rohini Mathur's work https://www.ncbi.nlm.nih.gov/pubmed/24323951 @krishnanbhaskaran do you have the codelists to hand? Or shall I ask Rohini?

CarolineMorton commented 4 years ago

Code list from TPP:

EthnicityCodes.xlsx

If we keep this, we can then apply Rohini's code in stata? @sebbacon @hmcd @krishnanbhaskaran does that sound sensible?

sebbacon commented 4 years ago

Rohini's paper is simply an evaluation of how reliable the coding is, no? (Conclusion: reliable enough?)

hmcd commented 4 years ago

Rohini developed codelists and an algorithm for dealing with conflicting or multiple codes for an individual - includes codes and algorithm. Don't have them readily to hand - @Krishnan has built them into our data extraction system!

On Thu, 2 Apr 2020 at 13:19, Seb Bacon notifications@github.com wrote:

Rohini's paper is simply an evaluation of how reliable the coding is, no? (Conclusion: reliable enough?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ebmdatalab/tpp-sql-notebook/issues/27#issuecomment-607811121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI5K5A4WD3JZ7GNQH3OJGW3RKR7ELANCNFSM4LXMFHXQ .

krishnanbhaskaran commented 4 years ago

No we dropped ethnicity from the data extraction tools. I'll ask Rohini. Watch out for email.

CarolineMorton commented 4 years ago

Draft sign off

DEFINITION: Latest patient ethnicity taken directly from patient's records

Example: patient_id code description
123 9S48. Black Black - other
332 9S1.. White - ethnic group

Notes: We will need to group ethnic groups together. There is not an easy way to do this within SQL or TPP without writing new code. On the other hand, LSHTM already hold stata code to do this. This is validated within the CPRD, less clear if it will still be valid in TPP. Using stata will be initial plan, but ultimately we would want to convert this to SQL code for cohort extraction tool. @sebbacon

POTENTIAL BIASES: There is likely to be relatively low percentage (perhaps ~50% if similar to CPRD levels) of ethnicity on record

CLINICAL SIGN OFF & DATE:

EPIDEMIOLOGY SIGN OFF & DATE:

SHARED WITH WIDER TEAM: Yes/No

FINAL SIGN OFF DATE (and apply label)

krishnanbhaskaran commented 4 years ago

Is it possible to cross check the codes that TPP are using to pull these records with the ones from Rohini? (Perhaps you have already done so?) Could be a good check of the underlying code list.

I don't think the issue of grouping afterwards is a big problem; we may be able to make use of Rohini's code and/or wouldn't be a huge job manually classify the ~300 codes into the 5 main ethnicity groups and implement as a dofile.

CarolineMorton commented 4 years ago

Is it possible to cross check the codes that TPP are using to pull these records with the ones from Rohini? (Perhaps you have already done so?) Could be a good check of the underlying code list.

I don't think the issue of grouping afterwards is a big problem; we may be able to make use of Rohini's code and/or wouldn't be a huge job manually classify the ~300 codes into the 5 main ethnicity groups and implement as a dofile.

@chris-tpp is this something that is easily doable?

alexwalkerepi commented 4 years ago

Ethnicity categories used previously by LSHTM: 0"White" 1"South Asian" 2"Black" 3"Other" 4"Mixed" 5"Not Stated"

brianmackenna commented 4 years ago

I've done initial investigations. @inglesp is the master of SnoMed hierarchies and we have identified one in SnoMed. I propose we use this hierarchy code list to start with groupings. Seem ok?

Then there are two issues

  1. It is an international list so ethnicity might be slightly different in UK as self-reported varies from country to country (it might be ok seen as though we are only looking at UK data and no comparisons - need to work through tomorrow)
  2. While it does groupings it doesn't group into six areas. I'm happy to group into the six areas (but so you all know, one of my pet hates is my ethnicity being aggregated into an ethnicity I don't identify with. Just saying you may have to up with my groaning)
brianmackenna commented 4 years ago

@rohinimathur good to have you on board! Before I do loads of work and seen as though this has been built on your list - do you have an aggregated version or hierarchical version handy?

rohinimathur commented 4 years ago

Hi @brianmackenna, is this the sort of thing you had in mind? Let me know if you want the info in another format! Read Codes for Ethnicity and Related Census Groupings.docx

brianmackenna commented 4 years ago

Exactly what I had in mind. Thanks! Do you have in a csv/excel format?

rohinimathur commented 4 years ago

Read codes for ethnicity and related census groupings.xlsx @brianmackenna here you go!

pipski505 commented 4 years ago

16+1 ethnicity categories with exclude flags (0/1, 0 include and 1 exclude).

EthnicityCodes.v2.0.xlsx

brianmackenna commented 4 years ago

I have reviewed the file based on @alexwalkercebm and @IevaLipska work. Discrepancies mainly come from the 1991 v 2001 hierarchies being different. Some rows in yellow highlighted for query

BMK UpdateEthnicityCodes.v1.0 (1).xlsx

quick call to finalise should sort it for this study

pipski505 commented 4 years ago

@brianmackenna I'm not sure how useful this is, but for some of the categories I consulted this, which seems to suggest that religious census categories are insufficient to determine race - which makes sense to me, because you could identify as e.g. Jewish but be black or white or anything in between. That's the reasoning I applied for a few others such as 'race: mixed' as technically it's very vague.

http://help.visionhealth.co.uk/visiondatahub/clinical%20portal/Content/G_Full%20Help%20Topics/Reporting/Ethnicity%20Definitions.htm

brianmackenna commented 4 years ago

@chris-tpp what ethnicity data exactly are we getting? Is it the data from the demographic box or is it the last recorded ethnicity anywhere in the record?

If it is the demographic box - is there a restriction on what values can be inputted? e.g. can you still choose a 1991 census category even though 2011 is most recent? Can people put religious identifiers in ethnicity status.

(this is the only schema i can find for the data)

chris-tpp commented 4 years ago

All ethnicity codes are in the ethnicity table. I think the latest one renders in the demographics box but would have to check. There are three high level parent ethnicity codes in CTV3; we allow any child codes of those.

brianmackenna commented 4 years ago

So myself @alexwalkercebm and @IevaLipska have grouped the final list here grouped to long list and short list.

@rohinimathur we one final (we hope) question we have is regarding the codes below. We have assigned 5 - 16 - other ethnic groups. Is this ok?

rohinimathur commented 4 years ago

@brianmackenna Hi Brian, yes I think this is fine. I'd be inclined to leave them out since they don't offer any useful information about 'ethnicity' as is conceptualized in the UK official statistics and reporting - but probably useful to include descriptively.

I think if this were the only ethnicity code on an individuals record then OK to use it, but if there are other ethnic codes which fit into white/south asian/black etc, then these should be superseded.

krishnanbhaskaran commented 4 years ago

CODED AS:

  1. White
  2. Mixed
  3. Asian or Asian British
  4. Black
  5. Other ethnic groups
Lextuga007 commented 4 years ago

Not an issue but I'm interested in seeing the ethnicity groupings that have been discussed here but the file attachments have all gone. I'm looking through opensafely/opencodelists but I hope you can direct me to a lookup (if there is one)?

I'm trying to group up some of the Read Codes for ethnicity or confirm where they can't do at all. African - ethnic category 2001 census being a key one as it cannot be grouped to Black as it was also used as White and Mixed as I can confirm from other clinical systems.

brianmackenna commented 3 years ago

@Lextuga007 Apologies for the delay but I think we missed this with all the activity in August. Hope this helps?