Closed jackwasey closed 6 years ago
@jackwasey Do you happen to know if there is an implementation of HCC risk scoring available in a language other than SAS? Unfortunately CMS has the software written in SAS language and its documentation is hard to read to re-write the code in another language.
I haven't seen another implementation. The SAS code is horrible: I just looked through it. The logic is actually very simple, and would be easy to implement using my R package as a basis. Also, the SAS code makes no attempt to identify errors in the data, which my package could do. This is on my to-do list, but I won't be able to get to it for quite a long time. Would you be interested in working on this?
There is one binary SAS data file in the v22 HCC software package which has the coefficients. Everything else is text, and shows some very simple logic which looks up CC codes from ICD-9-CM codes, applies the hierarchies, and then looks up coefficients based on per-patient flags provided by the user.
Would be good to have some public test data to work against. Maybe the Vermont data has this.
I am interested in implementing the CMS HCC. I am pretty busy on my side but I might be able to contribute something in the near future. For those interested, the National Bureau of Economic Research has more description on HCC: http://www.nber.org/data/cms-risk-adjustment-models.html
Thanks, that would be magnificent. Let me know if there is anything that could be tweaked in the core code that might make it easier for you. @wmuprhyrd just implemented a different risk score (van Walvaren), and it went very well. CMS HCC has additional complexity, but I think our platform is solid enough to handle this without any problems.
On Sun, Mar 22, 2015 at 9:25 PM, Alex notifications@github.com wrote:
I am interested in implementing the CMS HCC. I am pretty busy on my side but I might be able to contribute something in the near future. For those interested, the National Bureau of Economic Research has more description on HCC: http://www.nber.org/data/cms-risk-adjustment-models.html
— Reply to this email directly or view it on GitHub https://github.com/jackwasey/icd9/issues/31#issuecomment-84743655.
Hi @jackwasey and @mongoose54
I need to assign HCCs to ICD9s for a project I'm working on, and have started to implement this in R. the repo is at https://github.com/anobel/icdtohcc and is still quite basic (just importing/cleaning data so far), but would be interested in integrating it into the icd package. I've used icd but have not looked into the code recently (especially with update from icd9
-> icd
). Would love to collaborate/help.
Hi all, I just replied to an email from @anobel, but I'd love to help tackle this. I posted a simple conversion of SAS -> R on my blog about a year ago (http://healthydatascience.com/cms_hcc.html), but it's not generalizable and only builds HCC scores using a crosswalk from a single year.
This is perfect material for the package. I've not worked with this mapping before. My initial impression is that it could be implemented by having four new mappings. One for ICD-9 and ICD-10, with high or lower level categories represented by each. A function could then be added which would take a logical argument to determine whether to use the high or low level mapping.
This would not be much work at all, once each mapping is represented as a named list. I may well not understand the complexity of the hierarchy. Do we also need to be able to go from low-level to high level? Are there conditionals on assignment to groups which are based on things other than ICD codes?
Happy to help and accept pull requests.
I've written the code to create condition categories (CC) with labels for both ICD9/10, for every year from 2007-2016, from the original CMS files. I've also applied these CCs to ICDs, taking into account year and ICD version, which was straightforward. My progress is at icdtohcc.
The issue I can't seem to resolve is how to apply the hierarchy rules to convert CCs to HCC. Basically, for each year, you need to identify if a patient has one of the more severe CCs, and then zero out the less severe CCs. for example, if they have the CC for metastatic cancer and CC for prostate cancer, you have to zero out the CC for prostate cancer. I've created a dataframe of the hierarchy rules, but would love any help/insight into how to best apply the rules to the patient lists...I posted a question to stackoverflow with no responses yet.
Ife we can get this figured out, it would be great to incorporate it into the icd
package
From your stackoverflow question,
For every id/date combination, I need to check the hierarchy table for rules (for that year, as they change each year). If the condition category cc matches the hierarchy rule ifcc, each id/date in the df table needs to have the cc set to zero/NA/removed if it is in V2-V7 columns.
If the cc matches the ifcc for a given id/date, are you saying for the rest of the same id/date combinations, you want to essentially remove that row from the df? I'm unclear as to how the cc can be set to 0 if it matches ifcc (since it then won't appear in v2-v7).
Sorry, I'm not a SO user yet, so can't comment there.
Yes, @michaelgao8, thats correct. If the cc
matches the ifcc
, then any cc
that falls within v2-v7 should be removed from the patient list for that date/id combination...
I hope that makes sense. We can always reshape the data in any way that makes this more simple
@anobel, I'll answer your question from icdtohcc here. I'm assuming you have a column with a patient identifier and a column for date of encounter. Some thoughts on how to do this efficiently:
I'm a sucker for data.table. I saw before that you use tidyr, which may be as fast, but will be a different syntax. Let me know how this works out (or doesn't)!
@mpdakkak I want to clarify between cc
and hcc
and the current status. I've converted all the patient data to long format and mapped from ICDs to CC, and the merging is quite fast. The issue I can't seem to resolve is how to apply the hierarchy rules to the CCs to create HCCs. check out the stack overflow question (and maybe give it an up vote to get more attention!)
I'd just add that, much as I love ddply
etc, I don't want to add a massive dependency load to the package. I wrote a couple of wrapper functions that do long to wide and wide to long using base functions, but they also do validation of arguments and guessing which are the ICD code columns in a wide table. See 'icd_wide_to_longand 'icd_long_to_wide
. I'll have to leave the HCC discussion to you for now.
That makes sense. I've been able to implement the hierarchy to generate HCCs from CCs. I've also removed all dependencies to external packages except stringr (which I see icd depends on already). I'm using icd_wide_to_long() as well now. Will have to look into your code in more detail to sort out a consistent way to integrate this
does this mean we can close this?
I just started to implement the SAS version of HHS-HCC (2016) model in R. Can anyone help me to convert all the SAS code including the calculation of score to R?
Thanks for your message. Glad to hear you're working on this. @anobel led the HCC code which is already in icd
. Perhaps he is able to help you.
thanks! I've implemented code to assign CC and HCC categories based on the CMS model, using both ICD9 and ICD10. We've implemented this for multiple years. However, the next step would be to actually use the HCCs to assign the year-specific CMS-HCC "score". I have not yet tackled this but would love any help. The SAS code is available from CMS (https://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html) there is also a project here (https://github.com/healthactuary/cmshcc) that I wasn't able to get working but may have ones useful code
If you have a moment, take a look at what we've already incorporated in HCC assignment in ICD, and lets chat about how to extend this to add the scoring component.
Thanks @jackwasey and @anobel for reply. I have go through you project and it's really helpful. Once I complete the preliminary stages, will start the to converting the HCC score.
Hi, I'm wondering what the status is of converting the HCC scores into risk scores based on the coefficients from CMS's model in the SAS code (https://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html). I'm struggling to find the relevant coefficients in the SAS code. Has anyone made any progress on this?
I have been working on my own version of this but in t-SQL. Please refer to the below script. https://github.com/arunaryan/health-analytics/blob/master/HCC_Risk_score.sql
After converting ICD codes to CC and then to HCC there is another step to pick up HCC codes following contributing to score by Hierarchy. Please refer to page-8 of the attached document. HCC_risk_adjustment_051215.pdf
The below website is very helpful as they have converted the sas coeffcient files into .csv for use.
@arunaryan Thanks, I was just about to write to you about where you were storing the coefficients when you run line 1296: CROSS JOIN ref..HCCCoef hcc because I wasn't seeing that table anywhere in your code. So am I correct you just build this table from the pdf somehow? Do you have the coefficients from that PDF in a machine readable format by any chance?
@devonbrackbill The table is created from the coefficient files I found as .csv from http://www.nber.org/data/cms-risk-adjustment.html
each row has a unique modelid, year and coeff for all possible HCC, interaction variables, demographic variables as per the HCC Risk model. The code is still work in progress and I will add comments and the data model for the sproc. Apologies for the mess :)
No, I think the mess is Medicare's fault! The problem with the NBER coefficients in the .csv file is that it's difficult to interpret what the coefficient names mean. Like what does SNPNE_MCAID_ORIGDIS_NEM68 mean? There are a bunch like that that are impossible to comprehend, unless I'm missing something obvious.
Though it looks like you made some headway on it in your SQL code.
Hi Jack, I need some help using your R CMS_HCC model? I'm new with R programming, I know SAS. can you help me, Please.
Thank you, Shailesh Patel
Hi Shailesh; Is there a specific issue you're having or need help with? the HCC assignment is implemented in a similar fashion to elixhauser and charlson assignment in the package, and those two are well documented. thanks, a
Hello everyone,
THIS IS SAS hierarchies:-> model v22
%imposing hierarchies; /Neoplasm 1 / %SET0(CC=8 , HIER=%STR(9 ,10 ,11 ,12 )); /Neoplasm 2 / %SET0(CC=9 , HIER=%STR(10 ,11 ,12 )); /Neoplasm 3 / %SET0(CC=10 , HIER=%STR(11 ,12 )); /Neoplasm 4 / %SET0(CC=11 , HIER=%STR(12 )); /Diabetes 1 / %SET0(CC=17 , HIER=%STR(18 ,19 )); /Diabetes 2 / %SET0(CC=18 , HIER=%STR(19 )); /Liver 1 / %SET0(CC=27 , HIER=%STR(28 ,29 ,80 )); /Liver 2 / %SET0(CC=28 , HIER=%STR(29 )); /Blood 1 / %SET0(CC=46 , HIER=%STR(48 )); /SA1 / %SET0(CC=54 , HIER=%STR(55 )); /Psychiatric 1 /%SET0(CC=57 , HIER=%STR(58 )); /Spinal 1 / %SET0(CC=70 , HIER=%STR(71 ,72 ,103 ,104 ,169 )); /Spinal 2 / %SET0(CC=71 , HIER=%STR(72 ,104 ,169 )); /Spinal 3 / %SET0(CC=72 , HIER=%STR(169 )); /Arrest 1 / %SET0(CC=82 , HIER=%STR(83 ,84 )); /Arrest 2 / %SET0(CC=83 , HIER=%STR(84 )); /Heart 2 / %SET0(CC=86 , HIER=%STR(87 ,88 )); /Heart 3 / %SET0(CC=87 , HIER=%STR(88 )); /CVD 1 / %SET0(CC=99 , HIER=%STR(100 )); /CVD 5 / %SET0(CC=103 , HIER=%STR(104 )); /Vascular 1 / %SET0(CC=106 , HIER=%STR(107 ,108 ,161 ,189 )); /Vascular 2 / %SET0(CC=107 , HIER=%STR(108 )); /Lung 1 / %SET0(CC=110 , HIER=%STR(111 ,112 )); /Lung 2 / %SET0(CC=111 , HIER=%STR(112 )); /Lung 5 / %SET0(CC=114 , HIER=%STR(115 )); /Kidney 3 / %SET0(CC=134 , HIER=%STR(135 ,136 ,137 )); /Kidney 4 / %SET0(CC=135 , HIER=%STR(136 ,137 )); /Kidney 5 / %SET0(CC=136 , HIER=%STR(137 )); /Skin 1 / %SET0(CC=157 , HIER=%STR(158 ,161 )); /Skin 2 / %SET0(CC=158 , HIER=%STR(161 )); /Injury 1 */ %SET0(CC=166 , HIER=%STR(80 ,167 )); How I can convert to R, I tried, I get lost. Please someone help, I greatly appreciated.
Thank you, Shailesh Patel
How I can do above hierarchy by table driven? using elm column, in my table.
Thank you, Shailesh
Hi everyone!
This is really great work. I was wondering if there has been any progress in calculating a risk score from the HCC score in R? Also I know this was already mentioned, but this package (https://github.com/validatehealth/cmshcc) seems like it does all of the steps, according to these slides (http://ase.uva.nl/binaries/content/assets/subsites/amsterdam-school-of-economics/r-in-insurance/webster-risk-adjustment-in-r.pdf?1437549456225). Does anyone know how to use this package?
Thanks! Emma Kortemeier
Emma, and others, I took a brief look at the package 'cmshcc' by @healthactuary and @npritzl. It seems like a compact bit of code and a good fit for including in this package, or at least referencing. Those guys used dplyr, which I have avoided until now to keep the dependencies lower. We could see how we could work together: does the output from 'icd' make good input for 'cmshcc'? Could or should we bring that or similar code into 'icd'. Open to suggestions.
Hi Jack, thanks for your response. We could look at using icd instead of dplyr to do the actual diagnosis grouping. Which function in icd would be used? We actually used to use a function from icd9 but the change to icd for icd10 caused us to use dply. Would be open to merging into the icd package if there is an equivalent to the dcast function in dplyr to further process the output of icd. Take care. @ekortemeier what do you think? you have used both packages I believe.
It is great to hear from everyone! I have looked at both packages, but ended up using 'cmshcc' for calculating risk scores (thanks for your help Andrew!). I wasn't able to figure out how to use the icd package to do what I wanted ; i.e., produce a risk score starting from a dataset with diagnosis information (including icd10 codes). That being said, I am not sure if the output from 'icd' would be suitable input for 'cmshcc'. I do think the icd_map_cc_hcc table could be very useful, I just wasn't sure how to utilize it. Hope that helps!
@healthactuary icd_comorbid (and the family of related functions, such as icd9_comorbid_elix) should work.
I'm about to release a big new update which will dramatically improve performance on big data using matrix algebra. It will also use function names like comorbid_elix
instead of the previous icd_comorbid_elix
, but you can continue to use the previous function names.
And @ekortemeier - keep in touch: if you're working with ICD codes, then the community here can offer advice, and possibly extend icd
if you want it to do something useful.
(closing this issue as it was implemented a while ago by @anobel - thanks!)
Hi, all, following up with this. @healthactuary just reread your comment: " Would be open to merging into the icd package if there is an equivalent to the dcast function in dplyr to further process the output of icd."
Would you please be able to give some example code using dplyr? Then I can see how this could work with icd
.
Hi @jackwasey. Thanks for your response and sorry for the delay! The "get_hcc_grid" function in the cmshcc package uses dplyr > dcast to essentially just pivot the mapped HCC codes into columnar form. Then the HCC columns are manipulated to perform the hierarchy and interaction factors. Do you think that there is a function in the ICD package that would take a long-format combination of Medicare beneficiary ID and HCC codes to a unique row with Medicare ID as the primary key and HCC codes as columns. The mapping of ICD-10 diagnoses to the condition categories (CC's) is done using a merge.
get_hcc_grid <- function(PERSON, DIAG, cmshcc_map) { dummy_HCC_DIAG <- data.frame(HICNO="DUMMY", DX=cmshcc_map$DX, stringsAsFactors=FALSE) dummy_PERSON_DIAG <- data.frame(HICNO=PERSON$HICNO, DX="DUMMY", stringsAsFactors=FALSE) DIAG <- rbind(DIAG, dummy_HCC_DIAG, dummy_PERSON_DIAG) # ensures that all HCC columns appear in the grid merge_df <- merge(DIAG, cmshcc_map, by = "DX") merge_df$DX <- NULL merge_df <- distinct(merge_df) merge_df$indicator <- 1 hcc_grid <- dcast(merge_df, HICNO ~ CMSHCC, value.var="indicator", fill=0) hcc_grid <- subset(hcc_grid, HICNO!="DUMMY") hcc_grid$DUMMY <- NULL hcc_grid }
I don't use hierarchical condition codes myself. Would you please be able to give an examples of one input and expected output data frames?
http://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html
assigns ICD-9 codes to HCC codes, but also needs age and gender inputs. many:many mapping, but should still be able to map from set of ICD-9 codes for an individual to a set of HCC codes (which could be considered comorbidities).