iskandr / fancyimpute

Multivariate imputation and matrix completion algorithms implemented in Python
Apache License 2.0
1.25k stars 178 forks source link

RandomForest with MICE? #22

Closed abhivr closed 7 years ago

abhivr commented 7 years ago

Is it possible to use Random Forest model with MICE Model? I tried to use RandomForestRegressor from scikit-learn in MICE Model like below but got an error.

import fancyimpute as fi
from sklearn.ensemble import RandomForestRegressor
# X is the incomplete data matrix that has some values as NaN that has to be imputed.
rf = RandomForestRegressor(n_estimators = 100, oob_score = True, random_state = 42)
X_filled_MICE = fi.MICE(model=rf).complete(X_incomplete) 

I get the following error

[MICE] Completing matrix with shape (1309, 1865)
[MICE] Starting imputation round 1/110, elapsed time 0.052
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-1f0dcbdff7c9> in <module>()
      1 # X is the incomplete data matrix that has some values as NaN that has to be imputed.
      2 rf = RandomForestRegressor(n_estimators = 100, oob_score = True, random_state = 42)
----> 3 X_filled_MICE = fi.MICE(model=rf).complete(X_incomplete)

E:\Anaconda3\lib\site-packages\fancyimpute\mice.py in complete(self, X)
    364             print("[MICE] Completing matrix with shape %s" % (X.shape,))
    365         X_completed = X.copy()
--> 366         imputed_arrays, missing_mask = self.multiple_imputations(X)
    367         # average the imputed values for each feature
    368         average_imputated_values = imputed_arrays.mean(axis=0)

E:\Anaconda3\lib\site-packages\fancyimpute\mice.py in multiple_imputations(self, X)
    352                 missing_mask=missing_mask,
    353                 observed_mask=observed_mask,
--> 354                 visit_indices=visit_indices)
    355             if m >= self.n_burn_in:
    356                 results_list.append(X_filled[missing_mask])

E:\Anaconda3\lib\site-packages\fancyimpute\mice.py in perform_imputation_round(self, X_filled, 
missing_mask, observed_mask, visit_indices)
    220                     X_other_cols_observed,
    221                     column_values_observed,
--> 222                     inverse_covariance=None)
    223 
    224                 # Now we choose the row method (PMM) or the column method.

TypeError: fit() got an unexpected keyword argument 'inverse_covariance'`

Was not able to find documentation on this. Please let me know how I can use RandomForest with MICE?

sergeyf commented 7 years ago

We do not currently support other models other than the provided Bayesian Ridge Regression. Sorry!