synthetichealth / synthea

Synthetic Patient Population Simulator
https://synthetichealth.github.io/synthea
Apache License 2.0
2.19k stars 655 forks source link

Updated SNOMED Codes #1474

Open florim14 opened 5 months ago

florim14 commented 5 months ago

Pull Request Description: Updated SNOMED Codes

Overview

This pull request aims to update the SNOMED codes across various Synthea modules, ensuring they are current and active. The updates were made using a script specifically developed for this purpose.

Changes Made

  1. Script Development: A custom script was created to review and update the SNOMED codes within the Synthea modules. The script performed the following tasks:

    • Identified inactive or outdated SNOMED codes.
    • Checked for active versions of these codes. If no code found, search based on the display and the semantic tag (if it has one), and check if it matches with at least 70% of similarity.
    • Updated the modules with the active SNOMED codes.
  2. Code Updates:

    • The script successfully updated all SNOMED codes across the modules, except for 12 codes for which no active matches were found.
    • Out of these 12 unmatched codes, 11 are located in the "TNM_Diagnosis" module file.

What are the benefits:

Remaining Issues

Future Work

Thank you for considering this pull request.

jawalonoski commented 5 months ago

@florim14 thank you for the pull request.

I wonder how much your script differs from the current code display update script: https://github.com/synthetichealth/synthea/blob/master/src/main/javascript/update_code_display.js

Would you mind separating this pull request into two parts?

  1. PR with no SNOMED code changes except for an updated display value
  2. PR with updated SNOMED codes (replacing of inactive codes)

The first PR is easy to test and merge, the second requires detailed review.

florim14 commented 5 months ago

Hello Jason,

first of all, thank you for considering my pull request. I can modify my script to just update just the display, but that will not cover everything. Moreover I don't think it will be totally alright if we update the display values just by the code, because some certain display, have been updated to use a more clinical term or some other cases that I discovered while writing the script which I can't recall now. My script what it does is that it checks first based on the code, then checks if the fetched display matches with the existing ones. If not, I check based on the similarity. If still it does not exists, I fetch based on the display, and sometimes do some preprocessing for certain cases beforehand. I will send you in the next days, a diagram, illustrating the whole process, and maybe it might be more clearer why I don't think it's a good idea to separate the pull request into two parts.

Best regards, Florim Hamiti


From: Jason Walonoski @.> Sent: 19 June 2024 19:28 To: synthetichealth/synthea @.> Cc: Florim Hamiti @.>; Mention @.> Subject: Re: [synthetichealth/synthea] Updated SNOMED Codes (PR #1474)

@florim14https://github.com/florim14 thank you for the pull request.

I wonder how much your script differs from the current code display update script: https://github.com/synthetichealth/synthea/blob/master/src/main/javascript/update_code_display.js

Would you mind separating this pull request into two parts?

  1. PR with no SNOMED code changes except for an updated display value
  2. PR with updated SNOMED codes (replacing of inactive codes)

The first PR is easy to test and merge, the second requires detailed review.

— Reply to this email directly, view it on GitHubhttps://github.com/synthetichealth/synthea/pull/1474#issuecomment-2179213286, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFB6NWDA6CBNK5DPPEV5ZELZIG5SRAVCNFSM6AAAAABJSJ5L76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZZGIYTGMRYGY. You are receiving this because you were mentioned.

jawalonoski commented 5 months ago

The problem is that after perusing the changes for only a few minutes, a colleague and I have found that some of the code changes you've made are wrong.

Having two sets of changes makes it easier to differentiate what is merely a change on the preferred display value (changing the display value does not change the clinical meaning of the concept), versus all the changed codes that may refer to new or different concepts -- and those latter changes we have to manually verify and check, in some cases consulting with clinicians.

dehall commented 5 months ago

As one example, there is a change in the dementia module - 316744009 "Office Visit" was changed in a few places to 61488002 "Physical medicine initial examination for orthotic program (procedure)". Just at a glance, orthotics isn't really relevant to the dementia module so this change isn't correct. But, digging further into this to see what happened, the display "Office Visit" was wrong in the first place, this was a really old code and the display for that old code should have been "Persons encountering health services in circumstances related to reproduction ". So we really need to look closely at any code changes. There are probably more instances like this one where the old code was wrong or had the wrong display and so an automated process isn't going to produce a good result.

That's not to say a script to make those changes is wrong or bad, this code in the dementia module definitely needs to be changed and it's nice that a script can highlight that, but those changes will always need a close look

florim14 commented 5 months ago

Dear Dylan,

thank you for your email and for highlighting the issue with the code changes in the dementia module. I apologize for my delayed response. I recently discussed this matter with a knowledgeable colleague in the medical health field. He raised a good question regarding the relevance of the code "Physical medicine initial examination for orthotic program (procedure)" in the context of dementia. While it’s true that automated processes can highlight necessary changes, it is clear from your example that there can be rare instances where the old codes and displays were incorrect. But to my opinion, these cases should be rare. To address this, I can prepare a JSON file with the old codes and displays alongside the new ones that I have proposed in the PR. This should help us review these changes more thoroughly. I am happy to generate and share this file with you for a deeper review. Together, we can identify and implement the most suitable codes that align with the module's context and ensure accuracy across the board. Please let me know if this approach works for you, and I will send over the JSON file at your earliest convenience. Thank you in advance.

Best regards, Florim Hamiti


From: Dylan Hall @.> Sent: 20 June 2024 20:40 To: synthetichealth/synthea @.> Cc: Florim Hamiti @.>; Mention @.> Subject: Re: [synthetichealth/synthea] Updated SNOMED Codes (PR #1474)

As one example, there is a change in the dementia module - 316744009 "Office Visit" was changed in a few places to 61488002 "Physical medicine initial examination for orthotic program (procedure)". Just at a glance, orthotics isn't really relevant to the dementia module so this change isn't correct. But, digging further into this to see what happened, the display "Office Visit" was wrong in the first place, this was a really old code and the display for that old code should have been "Persons encountering health services in circumstances related to reproduction ". So we really need to look closely at any code changes. There are probably more instances like this one where the old code was wrong or had the wrong display and so an automated process isn't going to produce a good result.

That's not to say a script to make those changes is wrong or bad, this code in the dementia module definitely needs to be changed and it's nice that a script can highlight that, but those changes will always need a close look

— Reply to this email directly, view it on GitHubhttps://github.com/synthetichealth/synthea/pull/1474#issuecomment-2181308366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFB6NWCG6SICW2TM6BRK3CTZIMOYTAVCNFSM6AAAAABJSJ5L76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGMYDQMZWGY. You are receiving this because you were mentioned.

dehall commented 4 months ago

@florim14 Ok, I think it might be easier to split this into 2 PRs; one with just the display changes that we can quickly merge, and one with the code changes which we can use the github review features for. But if you prefer the separate json file I guess that's fine

florim14 commented 4 months ago

I think for interoperability sake it is better to change both of them at the same time. I will prepare the JSON file, send it to you, and you can let me know how it looks and if it needs any modifications.

Best regards, Florim Hamiti


From: Dylan Hall @.> Sent: 27 June 2024 21:42 To: synthetichealth/synthea @.> Cc: Florim Hamiti @.>; Mention @.> Subject: Re: [synthetichealth/synthea] Updated SNOMED Codes (PR #1474)

@florim14https://github.com/florim14 Ok, I think it might be easier to split this into 2 PRs; one with just the display changes that we can quickly merge, and one with the code changes which we can use the github review features for. But if you prefer the separate json file I guess that's fine

— Reply to this email directly, view it on GitHubhttps://github.com/synthetichealth/synthea/pull/1474#issuecomment-2195535628, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFB6NWGVQ5IHNEFE5HKHRETZJRTJRAVCNFSM6AAAAABJSJ5L76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGUZTKNRSHA. You are receiving this because you were mentioned.

florim14 commented 4 months ago

Dear Dylan,

I have attached two JSON files, one with all the replaced codes, and one where I have put the unique replaced codes, because the first one might contain duplicates in case one code had appeared multiple times. The first one also has the JSON module file where the code was replaced. Let me know how it looks, and if we need to change something. I have also attached you a third file (not_founded_codes.json) which shows the codes which my script could not find to replace but are inactive codes. In case we want to replace these, we have to go through them as well. Thank you a lot for considering my request, and if you have any questions, feel free to contact me.

Best regards, Florim Hamiti


From: Florim Hamiti @.> Sent: 27 June 2024 22:00 To: synthetichealth/synthea @.>; synthetichealth/synthea @.> Cc: Mention @.> Subject: Re: [synthetichealth/synthea] Updated SNOMED Codes (PR #1474)

I think for interoperability sake it is better to change both of them at the same time. I will prepare the JSON file, send it to you, and you can let me know how it looks and if it needs any modifications.

Best regards, Florim Hamiti


From: Dylan Hall @.> Sent: 27 June 2024 21:42 To: synthetichealth/synthea @.> Cc: Florim Hamiti @.>; Mention @.> Subject: Re: [synthetichealth/synthea] Updated SNOMED Codes (PR #1474)

@florim14https://github.com/florim14 Ok, I think it might be easier to split this into 2 PRs; one with just the display changes that we can quickly merge, and one with the code changes which we can use the github review features for. But if you prefer the separate json file I guess that's fine

— Reply to this email directly, view it on GitHubhttps://github.com/synthetichealth/synthea/pull/1474#issuecomment-2195535628, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFB6NWGVQ5IHNEFE5HKHRETZJRTJRAVCNFSM6AAAAABJSJ5L76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGUZTKNRSHA. You are receiving this because you were mentioned.

dehall commented 4 months ago

@florim14 Apologies for the delay, I was out a lot of last week with the holiday. It looks like the files didn't attach on github. Can you try uploading them to a comment here on the PR? https://github.com/synthetichealth/synthea/pull/1474

florim14 commented 4 months ago

@dehall - No worries. I have attached the mention files, let me know if any adjustments needs to be made:

not_founded_codes.json replaced_codes.json unique_replaced_codes.json

florim14 commented 2 months ago

@dehall - I wanted to follow on the last comment, is there any news regarding the check for updating the SNOMED codes?

dehall commented 2 months ago

Ahhh, again @florim14 my sincerest apologies for the delay on this. Unfortunately I no longer have time dedicated to synthea support so this unfortunately fell through the cracks. I took an initial look at the changed codes when you first posted them and agreed with nearly all the changes, but there were a couple I wanted to look closer at. I'll make sure get you an update by tomorrow at the latest.

dehall commented 2 months ago

Ok I finally took a closer look at the replaced codes -- see attached in CSV format: codes_review.csv This only includes the changed codes, I'm assuming the changed displays are all fine

In general the replacements look good but some of them have what I'll call an increase in specificity that's incorrect for the context the code is used in, for example 104173009, we had original display "Sputum Culture" but the official display is "Microbial culture of sputum (procedure)", this was changed to 104184002 "Sputum culture for mycobacterium (procedure)" which is a child code of the original, so is a more specific code which I don't think applies where it's used in the cystic fibrosis module. (And in this case the original code seems to still be valid anyway?) Let me know if you disagree with any of these. In terms of implementation, again I'd suggest reverting the ones we're not sure on and keeping the rest so we can merge the part we do feel good about.

florim14 commented 2 months ago

@dehall - thank you for your response. One thing we can do is that I can modify my script so that first it checks if the code is active, then we can automatically replace with the correct display. I can then put the codes changed this way (the displays more precisely) in a separate file, and we can review if it makes sense to change them. If you agree to this, I can modify the script and I can send you the new files Moreover, I will also add a codition to ignore the codes you send in the csv file which you marked as "Needs review", and we can check them

dehall commented 2 months ago

Yes that sounds good

florim14 commented 2 months ago

@dehall - I have updated my script to implement the changes we discussed. I am attaching you the following files:

Let me know if you agree with these changes, and if you want to proceed with the next step

dehall commented 2 months ago

Sounds good, I'm currently on travel but will take a look when I'm back 9/30

florim14 commented 2 months ago

Perfect, let me know how it looks, and then we can proceed further.

Best regards, Florim Hamiti


From: Dylan Hall @.> Sent: 20 September 2024 09:27 To: synthetichealth/synthea @.> Cc: Florim Hamiti @.>; Mention @.> Subject: Re: [synthetichealth/synthea] Updated SNOMED Codes (PR #1474)

Sounds good, I'm currently on travel but will take a look when I'm back 9/30

— Reply to this email directly, view it on GitHubhttps://github.com/synthetichealth/synthea/pull/1474#issuecomment-2363016131, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFB6NWDLQ63EX2EGZEB4GETZXPE7BAVCNFSM6AAAAABJSJ5L76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRTGAYTMMJTGE. You are receiving this because you were mentioned.Message ID: @.***>

florim14 commented 1 month ago

@dehall - I hope you had great holidays, and I wanted to follow on the SNOMED code changes, is there any news regarding the check for updating them?

dehall commented 1 month ago

Hi @florim14 , the files you posted look good. I'd propose the next step might be to update the PR itself to remove the changes to the ignored codes and then I think we should be good to merge it. After that we can find alternative codes separately in a second PR. Unfortunately as you've seen we have limited bandwidth to tackle this, so it may be a while before we can totally resolve all those other codes. Getting in the good display changes sooner would be nice

florim14 commented 1 month ago

@dehall - thank you for your response, and I know you are quite busy, but when you are available, we can have a look together to the remaing codes, and find the right one. I could not resolve some conflicts when trying to modify the pull request, so I had to create a new branch and from there I made a new pull request: https://github.com/synthetichealth/synthea/pull/1520 Let me know if anything is missing