vdorie / dbarts

Discrete Bayesian Additive Regression Trees Sampler
56 stars 21 forks source link

Predicting new groups in dbart_vi? #17

Open timdisher opened 4 years ago

timdisher commented 4 years ago

I am using dbart_vi to fit a random intercept model to allow for estimation of correlated effects across multiple outcomes. The ultimate goal of this analysis is to inform a microsimulation and thus it is important that I can predict new observations. Currently it looks as if predict requires the same group levels to be present, it it possible to allow prediction into new groups (so that for example, I can predict into a simulated population that is twice as large as the training set).

vdorie commented 4 years ago

I'm rewriting that right now - hope to have something by the end of the week.

timdisher commented 4 years ago

Great! Will test it in my application as soon as you do

vdorie commented 4 years ago

I checked in something (9fa610e) a few days ago that adds predict for out-of-sample groups. Let me know if you encounter any issues.

bachlaw commented 3 years ago

Vince, how would this work if, instead of using the predict function on a BART model with stored trees, we instead just added the new level to all levels of the random effects group in the test data (essentially, to marginalize out the random effect in predictions) and wanted to directly access posteriors for both to see the effect this has?

Right now, after doing what I described above, I can see within the BART object that the new category was added to the ranef slot and its value is essentially 0. So far, so good. But if I do a ColMeans on the draws in yhat.test they are the same as the values I get from in yhat.train, even though I would think they should be different, as all rows in yhat.test had the new group level.

Should I instead be looking to yhat.train.mean for the marginal mean without random effects and just not bother adding the testing data with the new level? Hope this question makes sense.

Thanks for this great feature.

vdorie commented 3 years ago

I'm sorry, but I don't directly follow. If you don't use predict and saved trees but have new levels in the test data, it should create random effects for them. I think the confusion is maybe that yhat.test is just the BART component, whereas if you want the full predictions for the test data you need to add in the random effects part too. The easiest thing to do would be to call fitted(fit, type = 'ev', sample = 'test').

I know all of this is confusing, which is why I'm working on a more general framework here. At the moment it can only do continuous outcomes, but it should have a cleaner interface.

jlevy44 commented 3 years ago

This is very helpful, thanks!