kogalur / randomForestSRC

DOCUMENTATION:
https://www.randomforestsrc.org/
GNU General Public License v3.0
113 stars 18 forks source link

Why do I get an predicted.oob although outcome="test"? #366

Open vfincke opened 1 year ago

vfincke commented 1 year ago

Hello,

I split my data into training (varimp.out) and test data set (varimp.test). As far as I understood, I should not get a predicted oob value when I set outcome="test" for the test data set, is that true? Can someone help me with this issue?

Best regards!

train <- sample(1:nrow(varimp4), round(nrow(varimp4) * 0.70))

testing step

varimp.test <- predict(varimp.out, varimp4[-train, ], outcome = "test") varimp.test Sample size of test (predict) data: 24 Number of grow trees: 5000 Average no. of grow terminal nodes: 7.3616 Total no. of grow variables: 42 Resampling used to grow trees: swor Resample size used to grow trees: 35 Analysis: RSF Family: surv CRPS: 0.23754244 Requested performance error: 0.99470899

varimp.test$predicted.oob [1] 7.372534 7.309725 7.449110 6.950437 6.541513 7.319113 7.296415 [8] 7.287099 6.120399 6.370140 6.474566 7.655736 7.669615 7.649079 [15] 6.049144 7.735172 7.796347 7.614550 7.807912 7.770450 7.747543 [22] 7.663421 7.780928 7.787535 varimp.test$predicted [1] 7.246682 7.246682 7.246682 7.246682 7.335530 7.322202 7.246682 [8] 7.246682 7.335530 7.335530 7.335530 7.246682 7.335530 7.246682 [15] 7.335530 7.335530 7.335530 7.246682 7.335530 7.335530 7.335530 [22] 7.335530 7.335530 7.335530

ishwaran commented 1 year ago

From the help file:

     If outcome="test", the predictor is calculated by using
     y-outcomes from the test data (outcome information must be
     present).  Terminal nodes from the trained forest are recalculated
     using y-outcomes from the test set.  This yields a modified
     predictor in which the topology of the forest is based solely on
     the training data, but where predicted values are obtained from
     test data.  Error rates and VIMP are calculated by bootstrapping
     the test data and using out-of-bagging to ensure unbiased
     estimates.

So the optionoutcome="test" replaces the training data with the test data outcomes so it now becomes possible to obtain an OOB predicted value for test data.