Open kyliewillis opened 1 year ago
This is a great find by @kyliewillis ... I want to add that I've calculated Elixhauser scores for a large (450k) dataset and I found that the correlation between comorbidity
(R
) and comorbidipy
(this package) was essentially one. However, for a small number of subjects (0.5%), the score calculated by comorbidity
was a few points higher than that of comorbidipy
... I think this is evidence of the same issue being raised which is why I am not opening a new issue.
Thanks to the developers for the work on this package. I hope this can be resolved relatively painlessly.
Attached: an illustration of scores between comorbidity
and comorbidipy
.
@kyliewillis - Thank you for reporting this! :pray: I didn't think anyone else was using this library. So it was a pleasant surprise to find this issue raised, albeit an embarrassing one as I missed it for an entire month.
@rpomponio - thanks for the fantastic work on the tests comparing the parent R package and this one :rocket:. Are you please able to share anonymised data for the cases where the two packages differ?
I will find some time to dig into this and fix it. (And will document it better as well - especially if people are using it!)
I suspect the reason for this bug is this code section here - https://github.com/vvcb/comorbidipy/blob/main/comorbidipy/calculator.py#L111-L116
It will be easy enough to find all the codes that map to more than one category. I will have to think about how this section can be modified. Should be straightforward (:coldsweat:)!
Having reviewed all the codes across all the comorbidity risk scores, there are a very small number of codes that cause this issue.
A workaround specific to these codes may be the most pragmatic and simple solution.
code | comorbidity 1 | comorbidity 2 |
---|---|---|
charlson_icd9_quan | ||
40403 | chf | rend |
40413 | chf | rend |
40493 | chf | rend |
charlson_icd10_se | ||
K703 | mld | mld |
charlson_icd10_am | ||
C80 | canc | metacanc |
elixhauser_icd9_quan | ||
40403 | chf | rf |
40413 | chf | rf |
40493 | chf | rf |
4255 | chf | alcohol |
elixhauser_icd10_quan | ||
I426 | chf | alcohol |
F315 | psycho | depre |
charlson_icd10_shmi | no issues | |
charlson_icd10_quan | No issues |
General info
Description
When a patient has a list of icd codes, each icd code is supposed to be mapped to its corresponding comorbidities. This works as expected for most codes. However, an issue arises when a code corresponds to multiple different comorbidities. For instance, ICD10 code I42.6, alcoholic cardiomyopathy, is supposed to map to both alcohol abuse as well as congestive heart failure (per quan ICD10 mapping). When the comorbidity function does its mapping/calculation, the icd code is only mapped once (to alcohol) instead of twice, to both alcohol & chf.
Ultimately, a code that should essentially count for 10 points (swiss: -3 alcohol, 13 chf) counts as -3 points if a patient does not have other codes recorded for chf.
It is also worth noting that this method deviates from the way that the R comorbidity package, which this repo is modeled after, calculates and maps comorbidities. When using that package, a patient with code I42.6 is mapped to both the alcohol & the chf comorbidities.
What I Did
Example using 3 different icd10 codes where this problem can be seen:
df_out output: