Closed sebbacon closed 3 years ago
According to Chris, this is a coded event already, but they will pull it into the patient table too.
We need to check the reliability of this - probably against census broad groups as historically black ethnicity has been poorly coded in CPRD, and may well be the same in TPP
@chris-tpp can we clarify what the data looks like?
We've split off another table with all the ethnicity codes from coded events (we haven't removed the codes from coded events, just created this table almost as a quick view - easier to deal with). It's based on the three high-level parent codes in CTV3 that have ethnicity codes underneath them. These are the codes people are restricted to using when recording ethnicity through the usual routes in the system.
@hmcd Do you have any algorithm or protocol to assign ethnicity (5/6 groups) based on codes? We are likely to have more granular ethnicity data and will need to categorize?
We use Rohini Mathur's work https://www.ncbi.nlm.nih.gov/pubmed/24323951 @krishnanbhaskaran do you have the codelists to hand? Or shall I ask Rohini?
Code list from TPP:
If we keep this, we can then apply Rohini's code in stata? @sebbacon @hmcd @krishnanbhaskaran does that sound sensible?
Rohini's paper is simply an evaluation of how reliable the coding is, no? (Conclusion: reliable enough?)
Rohini developed codelists and an algorithm for dealing with conflicting or multiple codes for an individual - includes codes and algorithm. Don't have them readily to hand - @Krishnan has built them into our data extraction system!
On Thu, 2 Apr 2020 at 13:19, Seb Bacon notifications@github.com wrote:
Rohini's paper is simply an evaluation of how reliable the coding is, no? (Conclusion: reliable enough?)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ebmdatalab/tpp-sql-notebook/issues/27#issuecomment-607811121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI5K5A4WD3JZ7GNQH3OJGW3RKR7ELANCNFSM4LXMFHXQ .
No we dropped ethnicity from the data extraction tools. I'll ask Rohini. Watch out for email.
Draft sign off
DEFINITION: Latest patient ethnicity taken directly from patient's records
Example: | patient_id | code | description |
---|---|---|---|
123 | 9S48. | Black Black - other | |
332 | 9S1.. | White - ethnic group |
Notes: We will need to group ethnic groups together. There is not an easy way to do this within SQL or TPP without writing new code. On the other hand, LSHTM already hold stata code to do this. This is validated within the CPRD, less clear if it will still be valid in TPP. Using stata will be initial plan, but ultimately we would want to convert this to SQL code for cohort extraction tool. @sebbacon
POTENTIAL BIASES: There is likely to be relatively low percentage (perhaps ~50% if similar to CPRD levels) of ethnicity on record
CLINICAL SIGN OFF & DATE:
EPIDEMIOLOGY SIGN OFF & DATE:
SHARED WITH WIDER TEAM: Yes/No
FINAL SIGN OFF DATE (and apply label)
Is it possible to cross check the codes that TPP are using to pull these records with the ones from Rohini? (Perhaps you have already done so?) Could be a good check of the underlying code list.
I don't think the issue of grouping afterwards is a big problem; we may be able to make use of Rohini's code and/or wouldn't be a huge job manually classify the ~300 codes into the 5 main ethnicity groups and implement as a dofile.
Is it possible to cross check the codes that TPP are using to pull these records with the ones from Rohini? (Perhaps you have already done so?) Could be a good check of the underlying code list.
I don't think the issue of grouping afterwards is a big problem; we may be able to make use of Rohini's code and/or wouldn't be a huge job manually classify the ~300 codes into the 5 main ethnicity groups and implement as a dofile.
@chris-tpp is this something that is easily doable?
Ethnicity categories used previously by LSHTM: 0"White" 1"South Asian" 2"Black" 3"Other" 4"Mixed" 5"Not Stated"
I've done initial investigations. @inglesp is the master of SnoMed hierarchies and we have identified one in SnoMed. I propose we use this hierarchy code list to start with groupings. Seem ok?
Then there are two issues
@rohinimathur good to have you on board! Before I do loads of work and seen as though this has been built on your list - do you have an aggregated version or hierarchical version handy?
Hi @brianmackenna, is this the sort of thing you had in mind? Let me know if you want the info in another format! Read Codes for Ethnicity and Related Census Groupings.docx
Exactly what I had in mind. Thanks! Do you have in a csv/excel format?
Read codes for ethnicity and related census groupings.xlsx @brianmackenna here you go!
16+1 ethnicity categories with exclude flags (0/1, 0 include and 1 exclude).
I have reviewed the file based on @alexwalkercebm and @IevaLipska work. Discrepancies mainly come from the 1991 v 2001 hierarchies being different. Some rows in yellow highlighted for query
BMK UpdateEthnicityCodes.v1.0 (1).xlsx
quick call to finalise should sort it for this study
@brianmackenna I'm not sure how useful this is, but for some of the categories I consulted this, which seems to suggest that religious census categories are insufficient to determine race - which makes sense to me, because you could identify as e.g. Jewish but be black or white or anything in between. That's the reasoning I applied for a few others such as 'race: mixed' as technically it's very vague.
@chris-tpp what ethnicity data exactly are we getting? Is it the data from the demographic box or is it the last recorded ethnicity anywhere in the record?
If it is the demographic box - is there a restriction on what values can be inputted? e.g. can you still choose a 1991 census category even though 2011 is most recent? Can people put religious identifiers in ethnicity status.
(this is the only schema i can find for the data)
All ethnicity codes are in the ethnicity table. I think the latest one renders in the demographics box but would have to check. There are three high level parent ethnicity codes in CTV3; we allow any child codes of those.
So myself @alexwalkercebm and @IevaLipska have grouped the final list here grouped to long list and short list.
@rohinimathur we one final (we hope) question we have is regarding the codes below. We have assigned 5 - 16 - other ethnic groups
. Is this ok?
@brianmackenna Hi Brian, yes I think this is fine. I'd be inclined to leave them out since they don't offer any useful information about 'ethnicity' as is conceptualized in the UK official statistics and reporting - but probably useful to include descriptively.
I think if this were the only ethnicity code on an individuals record then OK to use it, but if there are other ethnic codes which fit into white/south asian/black etc, then these should be superseded.
CODED AS:
Not an issue but I'm interested in seeing the ethnicity groupings that have been discussed here but the file attachments have all gone. I'm looking through opensafely/opencodelists but I hope you can direct me to a lookup (if there is one)?
I'm trying to group up some of the Read Codes for ethnicity or confirm where they can't do at all. African - ethnic category 2001 census being a key one as it cannot be grouped to Black as it was also used as White and Mixed as I can confirm from other clinical systems.
@Lextuga007 Apologies for the delay but I think we missed this with all the activity in August. Hope this helps?
our ethnicity list for CTV3 in TPP is available on https://codelists.opensafely.org/codelist/opensafely/ethnicity/2020-04-27/
(also for OpenSAFELY team - closing now. We will open a separate issue for ethnicity in Snomed)
Useful for calculating eGFR and possibly when investigating ACE/ARB effects, but
And there are also data quality issues (to be described here... TPP will update us)