Right now, the MultipleImputer does not use the initialized imputed dataframe, but rather selects only complete rows in the imputation. This occurs when each SingleImputer within a MultipleImputer calls the _get_observed method. While this is a common practice when strict MCAR or MAR conditions are met, using observed only can lead to selection bias when these conditions break down.
@gjvd provided this note (pg 10). In our implementation, the user should have the option to use observed only or use initialized DataFrame. This can be implemented as a formal argument with default value set.
Right now, the
MultipleImputer
does not use the initialized imputed dataframe, but rather selects only complete rows in the imputation. This occurs when eachSingleImputer
within aMultipleImputer
calls the_get_observed
method. While this is a common practice when strict MCAR or MAR conditions are met, using observed only can lead to selection bias when these conditions break down.@gjvd provided this note (pg 10). In our implementation, the user should have the option to use observed only or use initialized DataFrame. This can be implemented as a formal argument with default value set.