swager / randomForestCI

This package is DEPRECATED. Please use the packages `grf` or `ranger` instead, which have built-in confidence intervals.
https://github.com/swager/grf
MIT License
69 stars 21 forks source link

Question about Classification #4

Open wjz19920811 opened 8 years ago

wjz19920811 commented 8 years ago

Hi Professor Wager,

Thank you so much for your work. It has helped me a great deal.

I have read in your paper (Wager, Hastie, Efron (2012)) that infinitesimal jackknife can be applied to classification problems, such as in the email spam example. My question is, can 'randomForestCI' be used for such a purpose?

Having built a random forest model for predicting a categorical variable, I obtain one 'y.hat' and one 'var.hat' from running 'randomForestInfJack'. I expected there would be a separate variance related to the probability estimate of being in a class, so I wonder if you would mind illuminate on whether I may use this output in my case?

Thank you for your help.

Sincerely, Jinzhao

swager commented 8 years ago

Hi Jinzhao,

You can use randomForestCI for classification also, but only on the probability scale. In the the case of a two-class classification problem Y \in {0, 1}, if you use a classification forest to estimate \hat{y}(x) = P[Y = 1 | X = x], then the variance estimates given by the infinitesimal jackknife are meaningful. My paper with Susan Athey has a slightly more detailed answer; see Remark 2 on page 12.

Now, when it comes to multi-class classification, there's nothing in principle preventing one from using a similar infinitesimal jackknife on the probability scale; however, we have not yet implemented it.

Hope this helps, and let me know if you have any more questions!

All the best, Stefan

wjz19920811 commented 8 years ago

Hi Professor Wager,

Thank you so much for replying to me. Your answer helps a great deal--the paper proves to be a little help in addition to the original (2012) on infinitesimal jackknife.

I am planning to implement your R function for the multi-class classification random forest by converting a multi-class variable to binary ones... Please let me know if this sounds immediately false.

Your ideas greatly appreciated.

Warmly, Jinzhao