Closed Maziar-Kasaei closed 5 years ago
The probability estimates you have in "ranger" are not OOB.
That's not true. The predictions in rf$predictions
are always OOB and for probabilities just use probability = TRUE
(as you mentioned).
When we set probability = TRUE, it uses Malley et al.'s method to estimate probabilities (Is it OOB?) What I mean was to use the fraction of OOB votes to calculate probabilities. Thanks
Yes, the $predictions
with Malley at al.'s method are also OOB. To get the OOB votes of a standard classification forest, you could also use this: https://github.com/imbs-hl/ranger/issues/288#issuecomment-375628972.
May you please let me know whether in the following scenario, the probabilities are based on OOB or not:
rf=ranger(Species ~ . , data = iris, importance='pairwise', num.trees=100, probability=TRUE, write.forest=TRUE) probabilities=predict(rf, data=iris)$predictions
My next question is about the difference between rf$predictions and predict(rf, data=iris)$predictions. I got two different results but I do not have any idea why it happens.
Any help is highly appreciated.
As explained above, rf$predictions
is OOB. If you predict on the whole data set as with predict(rf, data=iris)$predictions
, that's not OOB.
Out-Of-Bag probability estimates are implemented in the "randomForest" package as model$votes. The probability estimates you have in "ranger" are not OOB. I tried both with predict.all=TRUE (calculating probabilities manually) and with just setting probability=TRUE (Malley et al.'s method) and both give the non-OOB estimates. In other words, when calculating a class probability estimate for one instance, you use all trees' votes for that instance no matter that instance was in-bag or out-of-bag. To increase accuracy, I would suggest the Out-Of-Bag probability estimate to be added to your awesome package. i.e. to calculate the probability estimates, just consider the ratio of the votes by those trees that a specific instance is out-of-bag for.
I wrote a piece of code (which is probably inefficient and simple) that calculates the OOB probability estimates for iris data set.