Right now, the MultipleImputer creates multiple samples of the same dataset, and it imputes each one independently. This is the repeat n times logic. That being said, the MultipleImputer does not actively improve each of the imputed datasets (i.e. each imputer runs only once on each column of each sample). The ChainedEquationsImputer (TBD on implementation) would handle iterative improvements to each imputation. The psuedo-code is as follows (provided by @gjdv):
repeat n times:
identify missingness in dataframe
initialize an imputed dataframe by inserting e.g., mean values per column where data is missing
while not stable (or for set k number of iterations):
for each column with missingness:
create a single imputer using the current column as output and the other columns as input to the model
update the imputed dataframe with imputed values where originally data was missing
Initial plan is to implement this as a NEW SeriesImputer. May need some changes to the MultipleImputer, although that is TBD.
Right now, the
MultipleImputer
creates multiple samples of the same dataset, and it imputes each one independently. This is therepeat n times
logic. That being said, theMultipleImputer
does not actively improve each of the imputed datasets (i.e. each imputer runs only once on each column of each sample). TheChainedEquationsImputer
(TBD on implementation) would handle iterative improvements to each imputation. The psuedo-code is as follows (provided by @gjdv):Initial plan is to implement this as a NEW
SeriesImputer
. May need some changes to theMultipleImputer
, although that is TBD.