Closed Yu-Liu207 closed 2 years ago
It's the MSE for both splitting rules. The prediction error metric just depends on the outcome:
Classification: Proportion of misclassifications Regression: MSE Probability estimation: Brier score Survival: One minus Harrell's C-index.
Thank you for the information!
I have another question regarding the use of maximally selected rank statistics. You mentioned in your 2017 Statistics in Medicine article that "MSR-RF are just one special case of the general approach to maximally selected statistics. It is easily extendable to continuous outcome by setting the scores equal to the ranks."
For a continuous outcome, when using ranger to fit a random forest model with maximally selected rank statistics, can you explain a bit more about how the ranger package "set the scores equal to the ranks"?
Thanks,
Yu
For survival outcomes, we use log-rank scores (a_i = ... on the very top of page 3 of the paper) to account for the censoring. For continuous outcomes, we don't have censoring and can simply set the a_i to the ranks of the outcome.
In code, the regression trees just use rank()
:
https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/TreeRegression.cpp#L407
https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/TreeRegression.cpp#L437
In survival trees we use logrankScores()
:
https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/TreeSurvival.cpp#L198
https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/TreeSurvival.cpp#L228
logrankScores()
is define here:
https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/utility.cpp#L543
maxstat()
here:
https://github.com/imbs-hl/ranger/blob/e8b05f47892bb4968c4e6057f68b35bcd0b52225/src/utility.cpp#L575
Hello Marvin,
I have one question on permutation importance with random forests for regression in ranger. Is the performance drop calculated using the Mean Square Error when splitrule = "variance"? How about when splitrule ="maxstat" for random forests for regression?
Thanks,
Yu