kearnz / autoimpute

Python package for Imputation Methods
MIT License
237 stars 19 forks source link

Option to use Initialized Imputed DataFrame (not just observed data) for Imputation #44

Closed kearnz closed 4 years ago

kearnz commented 4 years ago

Right now, the MultipleImputer does not use the initialized imputed dataframe, but rather selects only complete rows in the imputation. This occurs when each SingleImputer within a MultipleImputer calls the _get_observed method. While this is a common practice when strict MCAR or MAR conditions are met, using observed only can lead to selection bias when these conditions break down.

@gjvd provided this note (pg 10). In our implementation, the user should have the option to use observed only or use initialized DataFrame. This can be implemented as a formal argument with default value set.

kearnz commented 4 years ago

Implemented in v 0.12.0