Open timdisher opened 4 years ago
I'm rewriting that right now - hope to have something by the end of the week.
Great! Will test it in my application as soon as you do
I checked in something (9fa610e) a few days ago that adds predict for out-of-sample groups. Let me know if you encounter any issues.
Vince, how would this work if, instead of using the predict
function on a BART model with stored trees, we instead just added the new level to all levels of the random effects group in the test data (essentially, to marginalize out the random effect in predictions) and wanted to directly access posteriors for both to see the effect this has?
Right now, after doing what I described above, I can see within the BART object that the new category was added to the ranef
slot and its value is essentially 0. So far, so good. But if I do a ColMeans
on the draws in yhat.test
they are the same as the values I get from in yhat.train
, even though I would think they should be different, as all rows in yhat.test
had the new group level.
Should I instead be looking to yhat.train.mean
for the marginal mean without random effects and just not bother adding the testing data with the new level? Hope this question makes sense.
Thanks for this great feature.
I'm sorry, but I don't directly follow. If you don't use predict and saved trees but have new levels in the test data, it should create random effects for them. I think the confusion is maybe that yhat.test
is just the BART component, whereas if you want the full predictions for the test data you need to add in the random effects part too. The easiest thing to do would be to call fitted(fit, type = 'ev', sample = 'test')
.
I know all of this is confusing, which is why I'm working on a more general framework here. At the moment it can only do continuous outcomes, but it should have a cleaner interface.
This is very helpful, thanks!
I am using dbart_vi to fit a random intercept model to allow for estimation of correlated effects across multiple outcomes. The ultimate goal of this analysis is to inform a microsimulation and thus it is important that I can predict new observations. Currently it looks as if predict requires the same group levels to be present, it it possible to allow prediction into new groups (so that for example, I can predict into a simulated population that is twice as large as the training set).