medspacy / QuickUMLS

System for Medical Concept Extraction
MIT License
3 stars 6 forks source link

Quick UMLS mapping errors #3

Open ovpatterson opened 4 years ago

ovpatterson commented 4 years ago

Original text, Pre-processed text, EDIS CUI, Correct CUI Renal Failure, Renal Failure,C0001623,C0852163 Unstable angina, unstable angina, c0340288, c0002965 abdominal pain, abdominal pain, C0423651, c0000737 anorexia, anorexia, c1971624, c0003123 lactic acidosis, lactic acidosis, c4039171, C0001125 c diff diarrhea, difficulty, c0332218, c0011991 febrile, febrile, c0277792, c0015967 productive cough, productive cough, C0850149, c0239134

burgersmoke commented 4 years ago

@ovpatterson Question for you. For each match coming out of QuickUMLS, there will be a similarity value between 0.0 and 1.0.

Question for @ovpatterson and potentially @turbosheep . When you see this result: Original text, Pre-processed text, EDIS CUI, Correct CUI abdominal pain, abdominal pain, C0423651, c0000737

I see that C0423651 is actually the CUI for "no abdominal pain" whereas c0000737 as you mention would be the expected CUI: "abdominal pain" (i.e. not negated)

Were there multiple CUIs emitted for this text? I would expect that both C0423651 (negated) and c0000737 may be emitted, but I would expect that the "similarity" metric for the non-negated, expected CUI would be higher.

Did you notice if there were indeed multiple concepts emitted for that and what the similarity values were?

burgersmoke commented 4 years ago

As a secondary issue, this one: c diff diarrhea, difficulty, c0332218, c0011991

Is not really a QuickUMLS issue but likely an artifact of the EMT-P preprocessor rules not doing what we would expect.

turbosheep commented 4 years ago

I filtered out the similarity, for our outputs, unfortunately. Probably shouldn't since it would be useful information now. Although if there are a handful of negated CUIs being output, its probably pretty easy to round up all the documents that output that CUI and process them again with similarity.

burgersmoke commented 4 years ago

Yes, I would highly recommend keeping similarity. I use that to determine which CUI to actually keep as "final answer". There can be many candidate CUIs emitted, so I would not consider every concept emitted to be truth. I apologize I did not give better advance warning about that. I should have given that more thought!