synthetichealth / synthea

Synthetic Patient Population Simulator
https://synthetichealth.github.io/synthea
Apache License 2.0
2.14k stars 645 forks source link

Invalid medications in Synthea #469

Closed shabiel closed 4 years ago

shabiel commented 5 years ago

I raised this issue before, and made a pull request to clean up the existing meds, but I see it crept up again. I will put it here in github: When picking an RxNorm code, you must use something that has an NDC on the market, now or in the past.

E.g.: 1153378 is Clonazepam Oral. That is not something a patient can take. It has to be Clonzapeam 0.5mg oral tablet (197527) as an example.

jawalonoski commented 5 years ago

Thanks @shabiel. I added a note to the wiki to reinforce this practice: https://github.com/synthetichealth/synthea/wiki/Generic-Module-Framework:-Basics#rxnorm-codes

shabiel commented 5 years ago

Thank you.

Here's another one: 316049. That's Hydrochlorothiazide 25mg as a component. It can be part of 20-30 prescribable drugs.

I am not sure what I want to do about it yet. I can fix them again, but I want a long term solution. Something like a pull request reject if the RxNorm is not a valid RxNorm for a prescribable drug. Let me think about it.

dehall commented 5 years ago

GitHub does support pull request templates, so it's possible to add a checklist of requirements for the contributor and reviewer to verify on each pull request. As the synthea community continues to grow I think it would be a good idea to define some expectations that new contributions should adhere to. It's beneficial for both potential contributors to know in advance what the objective grading criteria are, as well as helpful for reviewers so they don't forget to check something potentially important.

shabiel commented 5 years ago

I am thinking of an automated way: a web service call on each pull request to check if the RxNorm has NDCs. I haven't validated these yet, but something like this: https://rxnav.nlm.nih.gov/RxNormAPIs.html#uLink=RxNorm_REST_getAllHistoricalNDCs and https://rxnav.nlm.nih.gov/RxNormAPIs.html#uLink=RxNorm_REST_getAllNDCs.

shabiel commented 5 years ago

316052: HCTZ 6.25mg does not exist on the market (and as far as I know, it never did, at least for humans. Maybe cats or dogs get to have it!).

shabiel commented 5 years ago

I will work over the next three days to get these fixed, so we can close the issue.

shabiel commented 5 years ago

I started on this. I wrote a script to extract all RxNorm codes from Synthea and analyze them. Here is the script and the preliminary results:

#!/bin/bash
file1=/tmp/syn.rxnorm.all
file2=/tmp/syn.rxnorm.sorted.uniq
rm -f $file1 $file2

# Grab all the RxNorm codes in all the models
echo "Extracting all RxNorm codes..."
for model in $(find src/main/resources/modules -type f); do
    jq -rc '.. | .codes?[]? | select(.system == "RxNorm").code' < $model >> $file1
done

# Deduplicate the codes, so we only have unique codes
echo "Deduplicating..."
cat $file1 | sort -n | uniq > $file2

#cat $file2
#exit 0

echo "Calling RxNorm API to check NDCs"
echo "CODE TYPE #NDCS" | column -tx
while read -r code; do 
    ndcsLength=$(curl -s https://rxnav.nlm.nih.gov/REST/rxcui/$code/ndcs.json | jq -c '.ndcGroup.ndcList.ndc | length')
    tty=$(curl -s https://rxnav.nlm.nih.gov/REST/rxcui/$code/property.json?propName=tty | jq -r '.propConceptGroup.propConcept[0].propValue')
    output="$code $tty $ndcsLength"
    echo $output | column -tx
done < $file2

Result:

CODE  TYPE  #NDCS
480  IN  0
4337  IN  0
10324  IN  0
38409  IN  0
56795  IN  0
72965  IN  0
73032  IN  0
84857  IN  0
105078  SCD  0
105586  SCD  0
106258  SCD  372
106892  SBD  5
141918  SCD  0
197319  SCD  163
197378  SCD  0
198014  SCD  525
198031  SCD  47
198240  SCD  40
198405  SCD  1
199224  SCD  68
200064  SCD  39
200243  SCD  16
200252  null  0
205532  SBD  2
205923  SBD  2
210856  SBD  0
235389  MIN  0
238100  SCD  16
243670  SCD  2
258494  IN  0
282464  SCD  0
308182  SCD  164
308192  SCD  21
308971  SBD  1
309043  SCD  3
309045  SCD  30
309097  SCD  70
309845  SCD  6
310261  SCD  12
310325  SCD  106
310436  SCD  33
310965  SCD  1468
311372  SCD  640
311700  SCD  60
311989  SCD  8
311995  SCD  68
312617  SCD  298
313002  SCD  46
313185  SCD  0
313782  SCD  346
314659  PIN  0
315971  SCDC  0
316049  SCDC  0
316672  SCDC  0
328670  SCDC  0
389221  SCD  0
406022  SCDC  0
429503  SCD  38
477045  SCD  1
483438  SCD  57
542347  SCD  20
562251  SCD  15
563026  SBDC  0
567645  SBDC  0
583214  SCD  0
596926  SCD  147
597195  SCD  75
608139  SCD  30
665078  SCD  21
672149  SCD  0
727762  SCD  9
745679  SCD  0
746030  SBD  1
748856  BPCK  3
748879  BPCK  6
748962  BPCK  4
749762  BPCK  2
749785  BPCK  8
749882  null  0
751905  BPCK  5
752899  SCD  0
757594  BPCK  1
789980  SCD  18
807283  SBD  3
831533  BPCK  4
833137  SBDC  0
834061  SCD  124
834102  SCD  223
835900  SBDC  0
849574  SCD  668
856980  SCD  36
857005  SCD  339
858069  SBDC  0
860975  SCD  276
861467  SCD  17
864718  SCD  47
865098  SBD  6
866414  SBD  18
895994  SCD  0
896209  SCD  3
897122  SCD  0
904419  SCD  25
966222  SCD  86
978950  BPCK  6
993452  SCD  0
996740  SCD  7
997223  SCD  212
997488  SCD  34
997501  SCD  121
998582  SBDC  0
998755  SCDC  0
999969  SBDC  0
1000126  SCD  24
1000156  SCD  0
1014676  SCD  61
1014678  SCD  618
1043400  SCD  302
1049221  SCD  308
1049630  SCD  484
1049636  SBDC  0
1049683  SCD  53
1091392  SCD  47
1094107  SCD  148
1114085  SCD  8
1153378  SCDG  0
1160499  SCDG  0
1234995  SCD  58
1310197  SBDC  0
1359133  BPCK  2
1363309  SCD  147
1366343  null  0
1367439  SBD  8
1373463  SCD  0
1534809  SCD  0
1599803  SCD  2
1601380  SCD  0
1605257  SBD  2
1650142  SCD  34
1652673  SCD  15
1655927  MIN  0
1656318  SCD  13
1658084  SCD  0
1659149  SCD  41
1719286  SCD  27
1732136  SCD  2
1732186  SCD  14
1734340  SCD  0
1734919  SCD  14
1736776  SCD  32
1736854  SCD  1
1737449  SCD  9
1740467  SCD  76
1790099  SCD  7
1791701  SCD  14
1803932  SCD  7
1808217  SCD  12
1856546  SBD  3
1860154  SCD  1
1860480  SCD  12
1870230  SCD  13
1873983  SCD  0
1940648  SCD  0
1946831  BN  0
2001499  SCD  1
2119714  SCD  0

I am very pleased with this. You can already tell where the problems are! I will refine the script as I learn more.

jawalonoski commented 4 years ago

Please consider submitting a pull request to add your refined script to a synthea/scripts/ folder.