viisar / brew

⛔️ DEPRECATED brew: Python Ensemble Learning API
MIT License
301 stars 70 forks source link

Predict : IndexError - Ensemble Classifier #35

Closed va26 closed 6 years ago

va26 commented 6 years ago

I am using Ensemble Classifiers from the package and was trying to create dynamic selection classifier from the following example. So my code snippet is like this:

# Initializing ensemble of different models

model1 = RandomForestClassifier(n_estimators=200, criterion='gini', max_depth=None, min_samples_split=2,
                            min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt',
                            max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None,
                            bootstrap=True, oob_score=False, n_jobs=-2, random_state=12, verbose=0,
                            warm_start=False)

clf = model1
bag = Bagging(base_classifier=clf, n_classifiers=20)
# Changing indices to 0...n instead of random distribution from train_test_split
X_train = X_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
X_val = X_val.reset_index(drop=True)
Y_val = Y_val.reset_index(drop=True)
Y_train = Y_train.reset_index(drop=True)
Y_test = Y_test.reset_index(drop=True)

bag.fit(X_np, Y_np)
ensemble = bag.ensemble

clf1 = sklearn.clone(clf).fit(X_train, Y_train)
clf2 = EnsembleClassifier(ensemble=ensemble, combiner=Combiner('majority_vote'))
clf3 = EnsembleClassifier(ensemble=ensemble, selector=OLA(X_train, Y_train), combiner=Combiner('majority_vote'))
clf4 = EnsembleClassifier(ensemble=ensemble, selector=LCA(X_train, Y_train), combiner=Combiner('majority_vote'))
clf5 = EnsembleClassifier(ensemble=ensemble, selector=APriori(X_train, Y_train), combiner=Combiner('majority_vote'))
clf6 = EnsembleClassifier(ensemble=ensemble, selector=APosteriori(X_train, Y_train), combiner=Combiner('majority_vote'))
clf7 = EnsembleClassifier(ensemble=ensemble, selector=KNORA_ELIMINATE(X_train, Y_train), combiner=Combiner('majority_vote'))
clf8 = EnsembleClassifier(ensemble=ensemble, selector=KNORA_UNION(X_train, Y_train), combiner=Combiner('majority_vote'))

The error that I get after running this is :

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-166-c02f6a396002> in <module>()
      3 models = [clf1, clf2, clf3, clf4, clf5, clf6, clf7, clf8]
      4 for i in models:
----> 5     y_pred = clf3.predict(X_test_np)
      6     print "Accuracy : ", acc_score(Y_test, y_pred)
      7     print "AUC Score : ", auc_score(Y_test, y_pred)

/home/vat26/.local/lib/python2.7/site-packages/brew/base.pyc in predict(self, X)
    260             for i in range(X.shape[0]):
    261                 ensemble, weights = self.selector.select(
--> 262                     self.ensemble, X[i, :][np.newaxis, :])
    263 
    264                 if weights is not None:  # use the ensemble with weights

/home/vat26/.local/lib/python2.7/site-packages/brew/selection/dynamic/ola.pyc in select(self, ensemble, x)
    114         classifiers = ensemble.classifiers
    115         [idx] = self.knn.kneighbors(x, return_distance=False)
--> 116         X, y = self.Xval[idx], self.yval[idx]
    117 
    118         scores = np.asarray([clf.score(X, y) for clf in classifiers])

/home/vat26/.local/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   2131         if isinstance(key, (Series, np.ndarray, Index, list)):
   2132             # either boolean or fancy integer index
-> 2133             return self._getitem_array(key)
   2134         elif isinstance(key, DataFrame):
   2135             return self._getitem_frame(key)

/home/vat26/.local/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_array(self, key)
   2175             return self._take(indexer, axis=0, convert=False)
   2176         else:
-> 2177             indexer = self.loc._convert_to_indexer(key, axis=1)
   2178             return self._take(indexer, axis=1, convert=True)
   2179 

/home/vat26/.local/lib/python2.7/site-packages/pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
   1267                 if mask.any():
   1268                     raise KeyError('{mask} not in index'
-> 1269                                    .format(mask=objarr[mask]))
   1270 
   1271                 return _values_from_object(indexer)

KeyError: '[290 109 240  11 524] not in index'

Can anyone help me with this or tell me where I am going wrong? Because I don't understand why I am getting this error

EDIT :

It works with clf1 and clf2 from clf3 onward it gives me an error

GillesVandewiele commented 6 years ago

Hello,

Just had the same issue. It appears to be a problem with passing Pandas objects into brew. Converting everything to numpy fixes it.

EnsembleClassifier(ensemble=ensemble, selector=OLA(X_train, Y_train), combiner=Combiner('majority_vote')) becomes EnsembleClassifier(ensemble=ensemble, selector=OLA(X_train.values, Y_train.values), combiner=Combiner('majority_vote'))

va26 commented 6 years ago

Thanks @GillesVandewiele it worked.