jackwasey / icd

Fast ICD-10 and ICD-9 comorbidities, decoding and validation in R. NB use main instead of master for default branch.
https://jackwasey.github.io/icd/
GNU General Public License v3.0
240 stars 60 forks source link

CMS HCC risk adjustment models #31

Closed jackwasey closed 6 years ago

jackwasey commented 10 years ago

http://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html

assigns ICD-9 codes to HCC codes, but also needs age and gender inputs. many:many mapping, but should still be able to map from set of ICD-9 codes for an individual to a set of HCC codes (which could be considered comorbidities).

mongoose54 commented 9 years ago

@jackwasey Do you happen to know if there is an implementation of HCC risk scoring available in a language other than SAS? Unfortunately CMS has the software written in SAS language and its documentation is hard to read to re-write the code in another language.

jackwasey commented 9 years ago

I haven't seen another implementation. The SAS code is horrible: I just looked through it. The logic is actually very simple, and would be easy to implement using my R package as a basis. Also, the SAS code makes no attempt to identify errors in the data, which my package could do. This is on my to-do list, but I won't be able to get to it for quite a long time. Would you be interested in working on this?

There is one binary SAS data file in the v22 HCC software package which has the coefficients. Everything else is text, and shows some very simple logic which looks up CC codes from ICD-9-CM codes, applies the hierarchies, and then looks up coefficients based on per-patient flags provided by the user.

Would be good to have some public test data to work against. Maybe the Vermont data has this.

mongoose54 commented 9 years ago

I am interested in implementing the CMS HCC. I am pretty busy on my side but I might be able to contribute something in the near future. For those interested, the National Bureau of Economic Research has more description on HCC: http://www.nber.org/data/cms-risk-adjustment-models.html

jackwasey commented 9 years ago

Thanks, that would be magnificent. Let me know if there is anything that could be tweaked in the core code that might make it easier for you. @wmuprhyrd just implemented a different risk score (van Walvaren), and it went very well. CMS HCC has additional complexity, but I think our platform is solid enough to handle this without any problems.

On Sun, Mar 22, 2015 at 9:25 PM, Alex notifications@github.com wrote:

I am interested in implementing the CMS HCC. I am pretty busy on my side but I might be able to contribute something in the near future. For those interested, the National Bureau of Economic Research has more description on HCC: http://www.nber.org/data/cms-risk-adjustment-models.html

— Reply to this email directly or view it on GitHub https://github.com/jackwasey/icd9/issues/31#issuecomment-84743655.

anobel commented 8 years ago

Hi @jackwasey and @mongoose54 I need to assign HCCs to ICD9s for a project I'm working on, and have started to implement this in R. the repo is at https://github.com/anobel/icdtohcc and is still quite basic (just importing/cleaning data so far), but would be interested in integrating it into the icd package. I've used icd but have not looked into the code recently (especially with update from icd9 -> icd). Would love to collaborate/help.

marksendak commented 8 years ago

Hi all, I just replied to an email from @anobel, but I'd love to help tackle this. I posted a simple conversion of SAS -> R on my blog about a year ago (http://healthydatascience.com/cms_hcc.html), but it's not generalizable and only builds HCC scores using a crosswalk from a single year.

jackwasey commented 8 years ago

This is perfect material for the package. I've not worked with this mapping before. My initial impression is that it could be implemented by having four new mappings. One for ICD-9 and ICD-10, with high or lower level categories represented by each. A function could then be added which would take a logical argument to determine whether to use the high or low level mapping.

This would not be much work at all, once each mapping is represented as a named list. I may well not understand the complexity of the hierarchy. Do we also need to be able to go from low-level to high level? Are there conditionals on assignment to groups which are based on things other than ICD codes?

Happy to help and accept pull requests.

anobel commented 8 years ago

I've written the code to create condition categories (CC) with labels for both ICD9/10, for every year from 2007-2016, from the original CMS files. I've also applied these CCs to ICDs, taking into account year and ICD version, which was straightforward. My progress is at icdtohcc.

The issue I can't seem to resolve is how to apply the hierarchy rules to convert CCs to HCC. Basically, for each year, you need to identify if a patient has one of the more severe CCs, and then zero out the less severe CCs. for example, if they have the CC for metastatic cancer and CC for prostate cancer, you have to zero out the CC for prostate cancer. I've created a dataframe of the hierarchy rules, but would love any help/insight into how to best apply the rules to the patient lists...I posted a question to stackoverflow with no responses yet.

Ife we can get this figured out, it would be great to incorporate it into the icd package

michaelgao8 commented 8 years ago

From your stackoverflow question,

For every id/date combination, I need to check the hierarchy table for rules (for that year, as they change each year). If the condition category cc matches the hierarchy rule ifcc, each id/date in the df table needs to have the cc set to zero/NA/removed if it is in V2-V7 columns.

If the cc matches the ifcc for a given id/date, are you saying for the rest of the same id/date combinations, you want to essentially remove that row from the df? I'm unclear as to how the cc can be set to 0 if it matches ifcc (since it then won't appear in v2-v7).

Sorry, I'm not a SO user yet, so can't comment there.

anobel commented 8 years ago

Yes, @michaelgao8, thats correct. If the cc matches the ifcc, then any cc that falls within v2-v7 should be removed from the patient list for that date/id combination...

I hope that makes sense. We can always reshape the data in any way that makes this more simple

marksendak commented 8 years ago

@anobel, I'll answer your question from icdtohcc here. I'm assuming you have a column with a patient identifier and a column for date of encounter. Some thoughts on how to do this efficiently:

I'm a sucker for data.table. I saw before that you use tidyr, which may be as fast, but will be a different syntax. Let me know how this works out (or doesn't)!

anobel commented 8 years ago

@mpdakkak I want to clarify between cc and hcc and the current status. I've converted all the patient data to long format and mapped from ICDs to CC, and the merging is quite fast. The issue I can't seem to resolve is how to apply the hierarchy rules to the CCs to create HCCs. check out the stack overflow question (and maybe give it an up vote to get more attention!)

jackwasey commented 8 years ago

I'd just add that, much as I love ddply etc, I don't want to add a massive dependency load to the package. I wrote a couple of wrapper functions that do long to wide and wide to long using base functions, but they also do validation of arguments and guessing which are the ICD code columns in a wide table. See 'icd_wide_to_longand 'icd_long_to_wide. I'll have to leave the HCC discussion to you for now.

anobel commented 8 years ago

That makes sense. I've been able to implement the hierarchy to generate HCCs from CCs. I've also removed all dependencies to external packages except stringr (which I see icd depends on already). I'm using icd_wide_to_long() as well now. Will have to look into your code in more detail to sort out a consistent way to integrate this

anobel commented 8 years ago

does this mean we can close this?

iamsafy commented 8 years ago

I just started to implement the SAS version of HHS-HCC (2016) model in R. Can anyone help me to convert all the SAS code including the calculation of score to R?

jackwasey commented 8 years ago

Thanks for your message. Glad to hear you're working on this. @anobel led the HCC code which is already in icd. Perhaps he is able to help you.

anobel commented 7 years ago

thanks! I've implemented code to assign CC and HCC categories based on the CMS model, using both ICD9 and ICD10. We've implemented this for multiple years. However, the next step would be to actually use the HCCs to assign the year-specific CMS-HCC "score". I have not yet tackled this but would love any help. The SAS code is available from CMS (https://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html) there is also a project here (https://github.com/healthactuary/cmshcc) that I wasn't able to get working but may have ones useful code

If you have a moment, take a look at what we've already incorporated in HCC assignment in ICD, and lets chat about how to extend this to add the scoring component.

iamsafy commented 7 years ago

Thanks @jackwasey and @anobel for reply. I have go through you project and it's really helpful. Once I complete the preliminary stages, will start the to converting the HCC score.

devonbrackbill commented 7 years ago

Hi, I'm wondering what the status is of converting the HCC scores into risk scores based on the coefficients from CMS's model in the SAS code (https://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html). I'm struggling to find the relevant coefficients in the SAS code. Has anyone made any progress on this?

arunaryan commented 7 years ago

I have been working on my own version of this but in t-SQL. Please refer to the below script. https://github.com/arunaryan/health-analytics/blob/master/HCC_Risk_score.sql

arunaryan commented 7 years ago

After converting ICD codes to CC and then to HCC there is another step to pick up HCC codes following contributing to score by Hierarchy. Please refer to page-8 of the attached document. HCC_risk_adjustment_051215.pdf

The below website is very helpful as they have converted the sas coeffcient files into .csv for use.

http://www.nber.org/data/cms-risk-adjustment.html

devonbrackbill commented 7 years ago

@arunaryan Thanks, I was just about to write to you about where you were storing the coefficients when you run line 1296: CROSS JOIN ref..HCCCoef hcc because I wasn't seeing that table anywhere in your code. So am I correct you just build this table from the pdf somehow? Do you have the coefficients from that PDF in a machine readable format by any chance?

arunaryan commented 7 years ago

@devonbrackbill The table is created from the coefficient files I found as .csv from http://www.nber.org/data/cms-risk-adjustment.html

each row has a unique modelid, year and coeff for all possible HCC, interaction variables, demographic variables as per the HCC Risk model. The code is still work in progress and I will add comments and the data model for the sproc. Apologies for the mess :)

devonbrackbill commented 7 years ago

No, I think the mess is Medicare's fault! The problem with the NBER coefficients in the .csv file is that it's difficult to interpret what the coefficient names mean. Like what does SNPNE_MCAID_ORIGDIS_NEM68 mean? There are a bunch like that that are impossible to comprehend, unless I'm missing something obvious.

Though it looks like you made some headway on it in your SQL code.

Aquaroyal72 commented 7 years ago

Hi Jack, I need some help using your R CMS_HCC model? I'm new with R programming, I know SAS. can you help me, Please.

Thank you, Shailesh Patel

anobel commented 7 years ago

Hi Shailesh; Is there a specific issue you're having or need help with? the HCC assignment is implemented in a similar fashion to elixhauser and charlson assignment in the package, and those two are well documented. thanks, a

Aquaroyal72 commented 7 years ago

Hello everyone,
THIS IS SAS hierarchies:-> model v22

%imposing hierarchies; /Neoplasm 1 / %SET0(CC=8 , HIER=%STR(9 ,10 ,11 ,12 )); /Neoplasm 2 / %SET0(CC=9 , HIER=%STR(10 ,11 ,12 )); /Neoplasm 3 / %SET0(CC=10 , HIER=%STR(11 ,12 )); /Neoplasm 4 / %SET0(CC=11 , HIER=%STR(12 )); /Diabetes 1 / %SET0(CC=17 , HIER=%STR(18 ,19 )); /Diabetes 2 / %SET0(CC=18 , HIER=%STR(19 )); /Liver 1 / %SET0(CC=27 , HIER=%STR(28 ,29 ,80 )); /Liver 2 / %SET0(CC=28 , HIER=%STR(29 )); /Blood 1 / %SET0(CC=46 , HIER=%STR(48 )); /SA1 / %SET0(CC=54 , HIER=%STR(55 )); /Psychiatric 1 /%SET0(CC=57 , HIER=%STR(58 )); /Spinal 1 / %SET0(CC=70 , HIER=%STR(71 ,72 ,103 ,104 ,169 )); /Spinal 2 / %SET0(CC=71 , HIER=%STR(72 ,104 ,169 )); /Spinal 3 / %SET0(CC=72 , HIER=%STR(169 )); /Arrest 1 / %SET0(CC=82 , HIER=%STR(83 ,84 )); /Arrest 2 / %SET0(CC=83 , HIER=%STR(84 )); /Heart 2 / %SET0(CC=86 , HIER=%STR(87 ,88 )); /Heart 3 / %SET0(CC=87 , HIER=%STR(88 )); /CVD 1 / %SET0(CC=99 , HIER=%STR(100 )); /CVD 5 / %SET0(CC=103 , HIER=%STR(104 )); /Vascular 1 / %SET0(CC=106 , HIER=%STR(107 ,108 ,161 ,189 )); /Vascular 2 / %SET0(CC=107 , HIER=%STR(108 )); /Lung 1 / %SET0(CC=110 , HIER=%STR(111 ,112 )); /Lung 2 / %SET0(CC=111 , HIER=%STR(112 )); /Lung 5 / %SET0(CC=114 , HIER=%STR(115 )); /Kidney 3 / %SET0(CC=134 , HIER=%STR(135 ,136 ,137 )); /Kidney 4 / %SET0(CC=135 , HIER=%STR(136 ,137 )); /Kidney 5 / %SET0(CC=136 , HIER=%STR(137 )); /Skin 1 / %SET0(CC=157 , HIER=%STR(158 ,161 )); /Skin 2 / %SET0(CC=158 , HIER=%STR(161 )); /Injury 1 */ %SET0(CC=166 , HIER=%STR(80 ,167 )); How I can convert to R, I tried, I get lost. Please someone help, I greatly appreciated.

Thank you, Shailesh Patel

Aquaroyal72 commented 7 years ago

How I can do above hierarchy by table driven? using elm column, in my table.

Thank you, Shailesh

ekortemeier commented 6 years ago

Hi everyone!

This is really great work. I was wondering if there has been any progress in calculating a risk score from the HCC score in R? Also I know this was already mentioned, but this package (https://github.com/validatehealth/cmshcc) seems like it does all of the steps, according to these slides (http://ase.uva.nl/binaries/content/assets/subsites/amsterdam-school-of-economics/r-in-insurance/webster-risk-adjustment-in-r.pdf?1437549456225). Does anyone know how to use this package?

Thanks! Emma Kortemeier

jackwasey commented 6 years ago

Emma, and others, I took a brief look at the package 'cmshcc' by @healthactuary and @npritzl. It seems like a compact bit of code and a good fit for including in this package, or at least referencing. Those guys used dplyr, which I have avoided until now to keep the dependencies lower. We could see how we could work together: does the output from 'icd' make good input for 'cmshcc'? Could or should we bring that or similar code into 'icd'. Open to suggestions.

healthactuary commented 6 years ago

Hi Jack, thanks for your response. We could look at using icd instead of dplyr to do the actual diagnosis grouping. Which function in icd would be used? We actually used to use a function from icd9 but the change to icd for icd10 caused us to use dply. Would be open to merging into the icd package if there is an equivalent to the dcast function in dplyr to further process the output of icd. Take care. @ekortemeier what do you think? you have used both packages I believe.

ekortemeier commented 6 years ago

It is great to hear from everyone! I have looked at both packages, but ended up using 'cmshcc' for calculating risk scores (thanks for your help Andrew!). I wasn't able to figure out how to use the icd package to do what I wanted ; i.e., produce a risk score starting from a dataset with diagnosis information (including icd10 codes). That being said, I am not sure if the output from 'icd' would be suitable input for 'cmshcc'. I do think the icd_map_cc_hcc table could be very useful, I just wasn't sure how to utilize it. Hope that helps!

jackwasey commented 6 years ago

@healthactuary icd_comorbid (and the family of related functions, such as icd9_comorbid_elix) should work.

I'm about to release a big new update which will dramatically improve performance on big data using matrix algebra. It will also use function names like comorbid_elix instead of the previous icd_comorbid_elix, but you can continue to use the previous function names.

And @ekortemeier - keep in touch: if you're working with ICD codes, then the community here can offer advice, and possibly extend icd if you want it to do something useful.

(closing this issue as it was implemented a while ago by @anobel - thanks!)

jackwasey commented 6 years ago

Hi, all, following up with this. @healthactuary just reread your comment: " Would be open to merging into the icd package if there is an equivalent to the dcast function in dplyr to further process the output of icd."

Would you please be able to give some example code using dplyr? Then I can see how this could work with icd.

healthactuary commented 6 years ago

Hi @jackwasey. Thanks for your response and sorry for the delay! The "get_hcc_grid" function in the cmshcc package uses dplyr > dcast to essentially just pivot the mapped HCC codes into columnar form. Then the HCC columns are manipulated to perform the hierarchy and interaction factors. Do you think that there is a function in the ICD package that would take a long-format combination of Medicare beneficiary ID and HCC codes to a unique row with Medicare ID as the primary key and HCC codes as columns. The mapping of ICD-10 diagnoses to the condition categories (CC's) is done using a merge.


get_hcc_grid <- function(PERSON, DIAG, cmshcc_map) { dummy_HCC_DIAG <- data.frame(HICNO="DUMMY", DX=cmshcc_map$DX, stringsAsFactors=FALSE) dummy_PERSON_DIAG <- data.frame(HICNO=PERSON$HICNO, DX="DUMMY", stringsAsFactors=FALSE) DIAG <- rbind(DIAG, dummy_HCC_DIAG, dummy_PERSON_DIAG) # ensures that all HCC columns appear in the grid merge_df <- merge(DIAG, cmshcc_map, by = "DX") merge_df$DX <- NULL merge_df <- distinct(merge_df) merge_df$indicator <- 1 hcc_grid <- dcast(merge_df, HICNO ~ CMSHCC, value.var="indicator", fill=0) hcc_grid <- subset(hcc_grid, HICNO!="DUMMY") hcc_grid$DUMMY <- NULL hcc_grid }

jackwasey commented 5 years ago

I don't use hierarchical condition codes myself. Would you please be able to give an examples of one input and expected output data frames?