prediction and variance for new data

swager / randomForestCI

This package is DEPRECATED. Please use the packages `grf` or `ranger` instead, which have built-in confidence intervals.

https://github.com/swager/grf

MIT License

69 stars 21 forks source link

prediction and variance for new data #8

Open MNRiverEcologyUnit opened 7 years ago

MNRiverEcologyUnit commented 7 years ago

I would like to obtain a variance estimate for a new observation (randomForestInfJack). An observation that was not in X, when creating the random forest (randomForest(X,Y,keep.inbag=T)). It is not clear to me if this should 1) be done, or if it is valid should the new observation be added to the original data set X and then run randomForestInfJack()? Dan O.

swager commented 7 years ago

Here's an example with confidence intervals on new observations:

# Make some data...
n = 250
p = 100
X = matrix(rnorm(n * p), n, p)
Y = rnorm(n)

#  Run the method
rf = randomForest(X, Y, ntree = 2000, keep.inbag = TRUE)

n.test = 100
X.test = matrix(rnorm(n.test * p), n.test, p)
ij = randomForestInfJack(rf, X.test, calibrate = TRUE)

MNRiverEcologyUnit commented 7 years ago

Thanks for the quick response. What if I wanted to predict Y and it's variance for one new data point where I have measured the set of X's but not the Y. Since I only have one new observation and you can't use just one row of data in the function would it be best to append/rbind the new observation to all of the original X?

swager commented 7 years ago

Hmm yeah ideally the code would let you specify just one prediction point; that looks like a bug we should fix.

In the mean time, yes, appending it to the original X should get the job done.

alionaBER commented 7 years ago

@dtoalm do you have the latest version of the randomForestCI package? If I understood your question correctly, it was already fixed. A couple of weeks ago I have made an adjustment to allow one-row predictions and it was merged to the master branch. Changing the example by @swager to test data consisting of one row works (except for calibration):

n.test = 1
X.test = matrix(rnorm(n.test * p), n.test, p)
ij = randomForestInfJack(rf, X.test, calibrate = TRUE)