Predictor Placeholder Imputation within PredictiveImputer

PredictiveImputer supports machine learning models for imputation. Right now, it does so by fitting a model (linear, logistic) on observed X and y (note: the values for X and y depend on what the user specifies as predictors when creating an instance of the class). From the fit on observed, we derive coefficients which we then use for imputation during the transform step.

During the transform step, we predict values for Y_mis using the covariates X_mis, or the values for the predictors where Y is missing. That being said, nothing guarantees that X_mis is fully observed. If covariates have missing values, we must impute them with something so that we can generate predictions for each y in Y_mis. Right now, we impute the predictors in the transform step with the default methods from the SingleImputer.

There are two questions with this we may want to address: 1) Should the SingleImputer be customizable as well? 2) Once a column is imputed, should imputed values be used in subsequent predictions?

The first question is just a matter of writing extra code and validation. The second is a bit more theoretical. It's more necessary for multiple imputation, where we use something like the visit sequence or the random selection process.

kearnz / autoimpute

Predictor Placeholder Imputation within PredictiveImputer #15