opensafely / codelist-development

Repository for discussion of OpenSAFELY codelists
7 stars 4 forks source link

Using CPRD Aurum-derived code lists in Opensafely #144

Open hjforbes opened 2 years ago

hjforbes commented 2 years ago

Two issues

  1. Some local EMIS codes in CPRD Aurum data dictionary have been given a fake SNOMED-CT concept ID. These are then not recognized when uploading to OpenSAFELY and need to be removed. This can be done by merging with the full SNOMED-CT concept IDs list, to identify the “fake” IDs
  2. Some SNOMED-CT codes are not in the CPRD Aurum Medical dictionary: a. Some inactive codes are not in the Aurum dictionary: b. Some active codes are not in the June 2021 CPRD Aurum browser – e.g. Term: Chronic post-COVID-19 syndrome (disorder): snomed-CT concept ID: 1119304009, added Jan 2021 release Term: Exacerbation of allergic asthma due to infection (disorder): snomed-CT concept ID 782520007. Possible reasons:
    • EMIS sends cprd the full Snomed dictionary and they include all codes in the browsers. Until recently they received primary care codes only whereas now they receive the full Snomed codes. This may explain why some inactive codes are missing
    • Is there some formatting issues that have occurred, meaning we are not successfully matching codes?
    • Are there some codes genuinely missing in Aurum?

Example: 1. Full asthma code list comparison: o Two codes in Aurum list not in OS: Snomedct:1773261000006103 fibrotic nonspecific interstitial pneumonia Snomedct:1773271000006105 cellular nonspecific interstitial pneumonia o OS-code list includes 89 inactive codes which are not the CPRD Aurum dictionary. We do not know when these became inactive and how frequently they were used. 2. Mini asthma code list comparison : we ran an identical search in OS and Aurum. We used search terms “asthma control”, “asthma trigger”, and “allergic asthma”. o In Aurum we found a total of 84 snomedct codes (and 98 unique medcodeid codes). There were duplicate snomedct codes that had different terms associated. Of the 84 snomedct codes, only 68 were unique codes. Eg:
SNOMED CT code: 735587000 term: allergic asthma with status asthmaticus
SNOMED CT code: 735587000 term: acute severe exacerbation of asthma co-occurrent and due to allergic asthma Both these terms appear in the SNOMED-CT browser. o In OS we found 112 snomedctcodes o When comparing,  66 snomed-ct codes were in both lists  the Aurum list had two “fake” snomed-ct codes  46 codes were in os_only (12 of which were currently active)

Potential solution: o Gather code lists developed using the CPRD Aurum Medical dictionary o Exclude local EMIS codes o Upload code list to OpenCodelists o Conduct an additional search within OpenCodelists, using key search terms to pick up inactive codes, plus act as an extra check.

Questions for OS team

  1. When does OS browser get updated?
  2. Can we get counts for the mini asthma search, to see how many codes would be missed if we simply used the CPRD Aurum derived list?

@inglesp @CarolineMorton @laurietomlinson @katetheyogi

hjforbes commented 2 years ago

Here is the mini asthma code list, mentioned above: https://www.opencodelists.org/codelist/user/hjforbes/asthma_mini_search/024b3d17/

hjforbes commented 2 years ago

Hi, @inglesp @CarolineMorton if there anyone that can help with this? If not, is there a way that I can run a study definition in OS to get the counts for the mini asthma code list?

jkquint commented 2 years ago

Hi Harriet, Just to follow on from this, when looking at the codes that are available to download from the OS website,

  1. They include a column simply headed ID so its not clear whether this is the snomed ct concept id number, the snomed description id number or the medcodeid number (which is the one you need if looking for rows of data)
    1. I assume it’s the medcodeid number as some of these numbers are 17 digits long. However if you compare a 17 digit long number listed on the website with what you get in the downloaded csv excel file – the last number gets altered. This could be an issue.
inglesp commented 2 years ago

Hi @hjforbes, sorry for the delay in getting back to you.

When does OS browser get updated?

A few days after each SNOMED release.

Can we get counts for the mini asthma search, to see how many codes would be missed if we simply used the CPRD Aurum derived list?

Here you go:

mini_asthma.csv

Then to answer @jkquint's points:

They include a column simply headed ID so its not clear whether this is the snomed ct concept id number, the snomed description id number or the medcodeid number (which is the one you need if looking for rows of data)

The ID is the concept ID. OpenCodelists doesn't know about medcodes.

However if you compare a 17 digit long number listed on the website with what you get in the downloaded csv excel file – the last number gets altered. This could be an issue.

This is a known problem with Excel treating strings of digits as numbers rather than as strings. (See also: telephone numbers...)

When you open a CSV file in Excel, there's a way to tell it not to guess data types. I don't have Excel on the laptop I'm using at the moment, but can dig out some instructions if you're having difficulty.

hjforbes commented 2 years ago

Ah, thank you! Very reassuring - the codes only in OS made up 0.2% of the total count (total count for OS only codes was 21362 and for codes in both lists was 11881615).

If you download the file into notepad, then import the .txt file into STATA, this avoids the truncation of IDs issue.

jkquint commented 2 years ago

Brilliant, thanks both. That’s very reassuring! BW Jenni

On 24 Nov 2021, at 12:35, hjforbes @.***> wrote:

 This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Ah, thank you! Very reassuring - the codes only in OS made up 0.2% of the total count (total count for OS only codes was 21362 and for codes in both lists was 11881615).

If you download the file into notepad, then import the .txt file into STATA, this avoids the truncation of IDs issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/opensafely/codelist-development/issues/144#issuecomment-977837533, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARLPQAHWKAZEORI3GC5UAXLUNTLZJANCNFSM5IJD5IGQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.