iskandr / fancyimpute

Multivariate imputation and matrix completion algorithms implemented in Python
Apache License 2.0
1.25k stars 178 forks source link

LinAlgError: Singular matrix in MICE #18

Closed drorhilman closed 7 years ago

drorhilman commented 7 years ago

I am getting a LinAlgError: Singular matrix Error when trying to impute with MICE..

from fancyimpute import MICE
vals = df[cols].copy()
vals = (vals /  vals.max()).values #important - make the max value as 1.0
imps = MICE(n_imputations = 50, impute_type ="col", verbose=1, min_value=0.0, max_value=1.0).complete(vals)

> /usr/local/lib/python2.7/dist-packages/fancyimpute/mice.pyc in complete(self, X)
>     364             print("[MICE] Completing matrix with shape %s" % (X.shape,))
>     365         X_completed = X.copy()
> --> 366         imputed_arrays, missing_mask = self.multiple_imputations(X)
>     367         # average the imputed values for each feature
>     368         average_imputated_values = imputed_arrays.mean(axis=0)
> 
> /usr/local/lib/python2.7/dist-packages/fancyimpute/mice.pyc in multiple_imputations(self, X)
>     352                 missing_mask=missing_mask,
>     353                 observed_mask=observed_mask,
> --> 354                 visit_indices=visit_indices)
>     355             if m >= self.n_burn_in:
>     356                 results_list.append(X_filled[missing_mask])
> 
> /usr/local/lib/python2.7/dist-packages/fancyimpute/mice.pyc in perform_imputation_round(self, X_filled, missing_mask, observed_mask, visit_indices)
>     220                     X_other_cols_observed,
>     221                     column_values_observed,
> --> 222                     inverse_covariance=None)
>     223 
>     224                 # Now we choose the row method (PMM) or the column method.
> 
> /usr/local/lib/python2.7/dist-packages/fancyimpute/bayesian_ridge_regression.pyc in fit(self, X, y, inverse_covariance)
>      66                 # interpreter with a savings of allocated arrays.
>      67                 outer_product[i, i] += lambda_reg
> ---> 68             self.inverse_covariance = inv(outer_product)
>      69         else:
>      70             self.inverse_covariance = inverse_covariance
> 
> /usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.pyc in inv(a)
>     524     signature = 'D->D' if isComplexType(t) else 'd->d'
>     525     extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
> --> 526     ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
>     527     return wrap(ainv.astype(result_t, copy=False))
>     528 
> 
> /usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.pyc in _raise_linalgerror_singular(err, flag)
>      88 
>      89 def _raise_linalgerror_singular(err, flag):
> ---> 90     raise LinAlgError("Singular matrix")
>      91 
>      92 def _raise_linalgerror_nonposdef(err, flag):
> 
> LinAlgError: Singular matrix
> 

Not getting into a similar problem with other imputation methods other than MICE, on same data.

sergeyf commented 7 years ago

Hmm. Have you tried converting vals to a np.array first?

drorhilman commented 7 years ago

Yes. I tried: vals = np.array(vals.values).
Still, the problem persist.

sergeyf commented 7 years ago

Odd. Are you able to post the data that causes this issue? Or does it occur with any data you send in there?

sergeyf commented 7 years ago

@drorhilman do let me know if you can provide the data or more info. At some point in the near future, I plan on rewriting MICE to use sklearn's built-in Bayesian ridge regression, but that will have to wait on a future release as it sklearn can't currently return uncertainties.

iskandr commented 7 years ago

@drorhilman Hey Dror, let us know if you can add a test case for this issue (or give us data so we can poke at it). Closing for now.

arinbjornk commented 7 years ago

This error occurs when a column is completely missing, i.e. only contains NaNs. The determinant of a zero matrix is 0 and therefore the matrix is singular and not invertible.

It needs to be invertible, see line 68 in bayesian_ridge_regression.py self.inverse_covariance = inv(outer_product).