imputing new data - Githubissues

topepo commented 7 years ago

It would be great to be able to build imputation models on data set X₁ and apply them to other data sets (X₁, X₂, ...). From what I can tell, you can only impute the same data that you start with.

markvanderloo commented 7 years ago

edit it is probably better to have an option to pass a model object as you suggest. I'll do that in stead :).

good point. I'm considering to add a train argument to the impute_* functions so a different training set can be used. I'll very probably do it too, even though it is considered bad practice in model-based imputation: (1) you introduce the assumption that model coefficients in the trainingset apply to the set to be imputed and (2) this means you can't do a lot of the fancy variance analyses you can do when training on the same dataset. Having said that, I think that the practice is not that bad if there's are good reasons to assume (1).

topepo commented 7 years ago

I'll very probably do it too

This needs to be my github signature for issues =]

My interest is more for estimation problems where I'm less concerned with the impact on the distributional assumptions. The first assumption is clearly still important though in either case. Thanks

markvanderloo commented 7 years ago

Max, if you install the drat version (0.2.2) there is now an impute function that works as follows:

model <- lm(foo ~ bar + baz, data=snafu)
impute(dat, imputed_variable ~ model)

code still resides in the modelimpute branch.

markvanderloo commented 7 years ago

now in main branch.

topepo commented 7 years ago

Thanks!

markvanderloo / simputation

imputing new data #15