iskandr / fancyimpute

Multivariate imputation and matrix completion algorithms implemented in Python
Apache License 2.0
1.25k stars 177 forks source link

ValueError in MICE #10

Closed zeeshansayyed closed 8 years ago

zeeshansayyed commented 8 years ago

I tried the following:

from fancyimpute import MICE
imputer = MICE()
imputed = imputer.complete(dummied)

The code crashed with the following stack trace:

imputed = imputer.complete(dummied)
[MICE] Completing matrix with shape (244796, 723)
Traceback (most recent call last):
  File "<ipython-input-50-db9415235e63>", line 1, in <module>
    imputed = imputer.complete(dummied)
  File "build/bdist.macosx-10.5-x86_64/egg/fancyimpute/mice.py", line 364, in complete
    imputed_arrays, missing_mask = self.multiple_imputations(X)
  File "build/bdist.macosx-10.5-x86_64/egg/fancyimpute/mice.py", line 314, in multiple_imputations
    self._check_missing_value_mask(missing_mask)
  File "build/bdist.macosx-10.5-x86_64/egg/fancyimpute/solver.py", line 54, in _check_missing_value_mask
    if not missing.any():
  File "/Users/zeeshan.sayyed/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 887, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

When I applied SoftImpute to the same matrix, it seems to be working.

Thanks

sergeyf commented 8 years ago

Is dummied a pandas dataframe? It should be a numpy array. I'll add a conversion.

zeeshansayyed commented 8 years ago

Aah. My bad. Yes. It's a dataframe. I'll convert it. Thanks. On Apr 28, 2016 4:03 PM, "Sergey Feldman" notifications@github.com wrote:

Is dummied a pandas dataframe? It should be a numpy array. I'll add a conversion.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/hammerlab/fancyimpute/issues/10#issuecomment-215588017

sergeyf commented 8 years ago

OK I added a commit that will do np.asarray to the input.

zeeshansayyed commented 8 years ago

Thanks. On Apr 28, 2016 4:12 PM, "Sergey Feldman" notifications@github.com wrote:

OK I added a commit that will do np.asarray to the input.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/hammerlab/fancyimpute/issues/10#issuecomment-215589395

velika12 commented 7 years ago

Hello, @sergeyf. I tried to pass a pandas dataframe as input value, the same as @zeeshansayyed did, but got an error. My dataframe consists of int and float values. Here's the error

ValueError                                Traceback (most recent call last)
<ipython-input-55-aeccbccb11e0> in <module>()
     14 
     15 mice = MICE()
---> 16 full_df_filled = MICE.complete(mice, mice_input)
     17 
     18 

/home/velika12/anaconda2/envs/main/lib/python2.7/site-packages/fancyimpute/mice.pyc in complete(self, X)
    335         # average the imputed values for each feature
    336         average_imputated_values = imputed_arrays.mean(axis=0)
--> 337         X_completed[missing_mask] = average_imputated_values
    338         return X_completed

/home/velika12/anaconda2/envs/main/lib/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2324 
   2325         if isinstance(key, (Series, np.ndarray, list, Index)):
-> 2326             self._setitem_array(key, value)
   2327         elif isinstance(key, DataFrame):
   2328             self._setitem_frame(key, value)

/home/velika12/anaconda2/envs/main/lib/python2.7/site-packages/pandas/core/frame.pyc in _setitem_array(self, key, value)
   2344             indexer = key.nonzero()[0]
   2345             self._check_setitem_copy()
-> 2346             self.loc._setitem_with_indexer(indexer, value)
   2347         else:
   2348             if isinstance(value, DataFrame):

/home/velika12/anaconda2/envs/main/lib/python2.7/site-packages/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value)
    577 
    578                     if len(labels) != len(value):
--> 579                         raise ValueError('Must have equal len keys and value '
    580                                          'when setting with an iterable')
    581 

ValueError: Must have equal len keys and value when setting with an iterable

If I convert the dataframe to a numpy array, everything works. So it seems like converting input only in your multiple_imputations function is not enough.

sergeyf commented 7 years ago

Have you tried passing in a numpy array?

On Jun 21, 2017 2:33 AM, "Darya" notifications@github.com wrote:

Hello, @sergeyf https://github.com/sergeyf. I tried to pass a pandas dataframe as input value, the same as @zeeshansayyed https://github.com/zeeshansayyed did, but got an error. My dataframe consists of int and float values. Here's the error

ValueError Traceback (most recent call last)

in () 14 15 mice = MICE() ---> 16 full_df_filled = MICE.complete(mice, mice_input) 17 18 /home/velika12/anaconda2/envs/main/lib/python2.7/site-packages/fancyimpute/mice.pyc in complete(self, X) 335 # average the imputed values for each feature 336 average_imputated_values = imputed_arrays.mean(axis=0) --> 337 X_completed[missing_mask] = average_imputated_values 338 return X_completed /home/velika12/anaconda2/envs/main/lib/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value) 2324 2325 if isinstance(key, (Series, np.ndarray, list, Index)): -> 2326 self._setitem_array(key, value) 2327 elif isinstance(key, DataFrame): 2328 self._setitem_frame(key, value) /home/velika12/anaconda2/envs/main/lib/python2.7/site-packages/pandas/core/frame.pyc in _setitem_array(self, key, value) 2344 indexer = key.nonzero()[0] 2345 self._check_setitem_copy() -> 2346 self.loc._setitem_with_indexer(indexer, value) 2347 else: 2348 if isinstance(value, DataFrame): /home/velika12/anaconda2/envs/main/lib/python2.7/site-packages/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value) 577 578 if len(labels) != len(value): --> 579 raise ValueError('Must have equal len keys and value ' 580 'when setting with an iterable') 581 ValueError: Must have equal len keys and value when setting with an iterable If I convert the dataframe to a numpy array, everything works. So it seems like converting input only in your multiple_imputations function is not enough. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .
velika12 commented 7 years ago

Yes, passing in a numpy array works. But I thought you added support of pandas dataframe as input before closing this issue, so I decided to report a bug here.