imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
776 stars 193 forks source link

Metric of permutation importance with random forests for regression #546

Closed Yu-Liu207 closed 2 years ago

Yu-Liu207 commented 3 years ago

Hello Marvin,

I have one question on permutation importance with random forests for regression in ranger. Is the performance drop calculated using the Mean Square Error when splitrule = "variance"? How about when splitrule ="maxstat" for random forests for regression?

Thanks,

Yu

mnwright commented 3 years ago

It's the MSE for both splitting rules. The prediction error metric just depends on the outcome:

Classification: Proportion of misclassifications Regression: MSE Probability estimation: Brier score Survival: One minus Harrell's C-index.

Yu-Liu207 commented 3 years ago

Thank you for the information!

I have another question regarding the use of maximally selected rank statistics. You mentioned in your 2017 Statistics in Medicine article that "MSR-RF are just one special case of the general approach to maximally selected statistics. It is easily extendable to continuous outcome by setting the scores equal to the ranks."

For a continuous outcome, when using ranger to fit a random forest model with maximally selected rank statistics, can you explain a bit more about how the ranger package "set the scores equal to the ranks"?

Thanks,

Yu

mnwright commented 3 years ago

For survival outcomes, we use log-rank scores (a_i = ... on the very top of page 3 of the paper) to account for the censoring. For continuous outcomes, we don't have censoring and can simply set the a_i to the ranks of the outcome.

In code, the regression trees just use rank(): https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/TreeRegression.cpp#L407 https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/TreeRegression.cpp#L437

In survival trees we use logrankScores(): https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/TreeSurvival.cpp#L198 https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/TreeSurvival.cpp#L228

logrankScores() is define here: https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/utility.cpp#L543

maxstat() here: https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/utility.cpp#L575