kearnz / autoimpute

Python package for Imputation Methods
MIT License
237 stars 19 forks source link

Integration with Sklearn Pipeline #52

Closed MO105 closed 4 years ago

MO105 commented 4 years ago

I wanted to integrate auto impute with Sklearn pipeline as seen here :

https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html

However, I get the error AttributeError: 'generator' object has no attribute 'size' when I try and substitute SimpleImputer(strategy='median') with MultipleImputer().

edit : from what I can understand Sklearnimputer.fit_transform(X) returns an array whereas auto impute returns a generator object, which can then be fed into a DataFrame .

Thanks

kearnz commented 4 years ago

@MO105 A coulpe notes:

  1. By default, the MultipleImputer returns a generator. You can return a list with by setting the return_list argument. return_list=True. There's an example in the README.
  2. If you're using median, Median imputation is indempotent, so no need to use MultipleImputer which will do more work than SingleImputer.
  3. Integration with sklearn pipelines can be tricky. It works if you are imputing with univariate methods, but it may not work if you're doing something like MICE, because we'd use the target (y) in the imputation process. The target is separated out in sklearn pipelines.

Closing this for now as I believe this answers your questions. Let me know if any others!