Open JieyingJiao opened 1 year ago
You should turn off all performance during training using the option perf.type="none"
and then extract what you want later using the predict
function. A key is that the latter function has option get.tree
which allows you to pull single or ensemble trees over which you can then extract information, either using built in values, or by applying external functions.
Here's an example for pulling the C-error rate from the first 10 trees where performance is off during training.
data(pbc)
o <- rfsrc(Surv(days, status) ~ ., pbc, perf.type ="none")
predict(o,get.tree=1:10,block.size=10)$err.rate[10]
[1] 0.1942955
Here we get the cumulative error rate for the first 10 trees
predict(o,get.tree=1:10,block.size=1)$err.rate[1:10]
[1] 0.2714395 0.2423318 0.2383721 0.2276816 0.2314720 0.2241849 0.2209717 0.2072547 0.1982001 0.1942955
If you want to use the Brier score, then here's an example where we extract the OOB ensemble made up of the first 10 trees and then apply the pre-built function to it:
p <- get.brier.survival(predict(o,get.tree=1:10)) plot(p$brier.score,type="l")
For you last question (#3) unfortunately the C-index is the only available metric for survival, so all downstream performance values (like VIMP) are based on the C-index.
You should turn off all performance during training using the option
perf.type="none"
and then extract what you want later using thepredict
function. A key is that the latter function has optionget.tree
which allows you to pull single or ensemble trees over which you can then extract information, either using built in values, or by applying external functions.Here's an example for pulling the C-error rate from the first 10 trees where performance is off during training.
data(pbc) o <- rfsrc(Surv(days, status) ~ ., pbc, perf.type ="none") predict(o,get.tree=1:10,block.size=10)$err.rate[10] [1] 0.1942955
Here we get the cumulative error rate for the first 10 trees
predict(o,get.tree=1:10,block.size=1)$err.rate[1:10] [1] 0.2714395 0.2423318 0.2383721 0.2276816 0.2314720 0.2241849 0.2209717 0.2072547 0.1982001 0.1942955
If you want to use the Brier score, then here's an example where we extract the OOB ensemble made up of the first 10 trees and then apply the pre-built function to it:
p <- get.brier.survival(predict(o,get.tree=1:10)) plot(p$brier.score,type="l")
For you last question (#3) unfortunately the C-index is the only available metric for survival, so all downstream performance values (like VIMP) are based on the C-index.
Thanks a lot for the response. For the last question about VIMP, I think the VIMP is also turned off if using perf.type = 'none'. Is there also a way to calculate VIMP after the model fitting with performance turned off, and using self-defined external function for c-index while calculate VIMP? I guess the function vimp() and subsample() only works for the model object that has vimp turned on.
Yes, you can retrieve VIMP using the predict
function using the get.tree
option. You should see the help file because there's a bunch of examples illustrating this.
Hi,
I'm using rfsrc to fit a survival model, but the c-index calculation is very slow, which also makes the performance and VIMP calculation takes very long time I think.
Thanks a lot for the help.
Best, Jieying