zmjones / edarf

exploratory data analysis using random forests
MIT License
68 stars 11 forks source link

OOB parameter for variable importance when using party #53

Closed gavin-s-smith closed 8 years ago

gavin-s-smith commented 8 years ago

Just a quick note that when the variable importance is underpinned by the party package (at least the current version 1.0-25) the OOB parameter does not work, i.e. you cannot get the code to use out of bag samples.

This is because the party "predict" code that you (eventually) call (line 138 from https://github.com/cran/party/blob/R-3.0.3/R/RandomForest.R) eventually forces OOB to false if you are passing new data (which you are since you permute the data). Modifying RandomForest.R in the party code to prevent this forced behaviour seems to fix this, but I've not looked into why this behaviour is there in the first place to ensure that the code subsequently logically does the right thing. In any case since this is not your code perhaps it's just worth noting that the OOB parameter doesn't work in this case in the documentation?

zmjones commented 8 years ago

Ah I didn't notice that. This behavior makes sense though (in party). I've just looked and this is also the case with randomForest and randomForestSRC.

What is the modification you are making. Despite my having put the parameter in there, I don't think it makes sense after thinking about it. In/out of bag only makes sense for the unpermuted covariates. So I can't see how you could compute OOB predictions.

I am going to go ahead and remove the parameter now though. Thanks for noticing this.