kearnz / autoimpute

Python package for Imputation Methods
MIT License
237 stars 19 forks source link

MICE Imputer does not implement the MICE algorithm #51

Closed MHStadler closed 4 years ago

MHStadler commented 4 years ago

The series imputers used by the MICE Imputer, still call _get_observed (https://github.com/kearnz/autoimpute/blob/d1a4c3966ea4138cd52111d51ef22d2fb43648e2/autoimpute/imputations/dataframe/single_imputer.py#L185)

However, the point of the MICE algorithm is to not use only the complete dataset, but to use the previous round's imputations as placeholders in the variables that are not currently being imputed (if I am imputing column x i want to use imputed placeholders in columns y, and z before doing the predictive imputing of column x)

For the first round, missing values are replaced by simple placeholders (mean/mode imputation) - additionally the order in which the columns are imputed at each round should be customizable

In the current implementation, the algorihm simply runs the multiple imputer K times, and the result is equal to the K-1th run, regardless of the results of the previous runs

See for reference: Azur, Melissa J., et al. "Multiple imputation by chained equations: what is it and how does it work?." International journal of methods in psychiatric research 20.1 (2011): 40-49.

MHStadler commented 4 years ago

Please correct me if I'm wrong - I don't see how to set the default values or where the placeholders are replaced with previous imputations

MHStadler commented 4 years ago

Nvm, I found it now - sorry about that

kearnz commented 4 years ago

no worries!