scikit-learn-contrib / DESlib

A Python library for dynamic classifier and ensemble selection
BSD 3-Clause "New" or "Revised" License
480 stars 106 forks source link

index is out of bounds #176

Closed jayahm closed 4 years ago

jayahm commented 4 years ago

I was trying to run StaticSelection on my dataset using Jupyter Notebook.

I used loop to run on 40 different datasets (1 dataset = 1 subject).

For other classifiers (SVM, KNN, etc), nothing was wrong.

But for StaticSelection, I got the error below.

It seems it has an issue with the test set and its label.

Why only this happened to StaticSelection (and also SingleBest)?

IndexError                                Traceback (most recent call last)
<ipython-input-82-a2f7f60d152c> in <module>
     76 
     77 result_stacked_user = model_stacked.score(X_test, y_test)
---> 78 result_static_selection_user = model_static_selection.score(X_test, y_test)
     79 #result_single_best_user = model_single_best.score(X_test, y_test)

~\Anaconda3\lib\site-packages\sklearn\base.py in score(self, X, y, sample_weight)
    355         """
    356         from .metrics import accuracy_score
--> 357         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
    358 
    359 

~\Anaconda3\lib\site-packages\deslib\static\static_selection.py in predict(self, X)
    113         predicted_labels = majority_voting(self.ensemble_, X).astype(int)
    114 
--> 115         return self.classes_.take(predicted_labels)
    116 
    117     def _check_is_fitted(self):

IndexError: index 11 is out of bounds for size 2
jayahm commented 4 years ago

UPDATE:

If I used BaggingClassifier as the pool of classifiers, the StaticSelection worked.

But, for heterogeneous pool, the issue happened.

jayahm commented 4 years ago

UPDATE:

I included StaticSelection in your original example_heterogeneous example using Jupyter.

I got this error: '<' not supported between instances of 'str' and 'int'

You may see the file: https://www.dropbox.com/s/z4he7u1vmxppfy5/example_heterogeneous_static.ipynb?dl=0

Menelau commented 4 years ago

@jayahm Hello,

Sorry for the late response, I believe that is a bug in the way we are handling label encoding in this estimator. I will investigate this issue right now and get back to you asap.

jayahm commented 4 years ago

Hi, is there any update on this?

PedroSilvaAlves commented 4 years ago

Hey, i'm also waiting for an update in this.

Menelau commented 4 years ago

@jayahm , @PedroSilvaAlves Hello,

Sorry for the very late response, I just came back from vacations. I took a look at the issue and it is really related to problems while performing label encoding inside the static selection method. I started working on it on a new branch and it seems the problem is solved. I just need to perform more tests to see if everything is ok and will probably send a pull request with a fix tomorrow.

Meanwhile, if you want you can check the branch that I'm fixing this issue: https://github.com/scikit-learn-contrib/DESlib/tree/fix_label_encoder

jayahm commented 4 years ago

@Menelau, thank you. you works have helped me a lot. is the problem has been solved?

Menelau commented 4 years ago

@jayam Hello,

The patch with the fix is almost done. The code example you provided that was giving an error before is working fine now as well as other test cases I created to ensure the problem is solved.

Now I just need to make sure that these changes did not break the scikit-learn check_estimators test in order to merge the patch. I'm looking at it right now.

Menelau commented 4 years ago

@jayahm,

I merged the fix to the master branch. Note that in order to get the update you need to reinstall the library using the latest version: pip install git+https://github.com/scikit-learn-contrib/DESlib

please let me know if everything works for you.

jayahm commented 4 years ago

@Menelau , great, let me try

jayahm commented 4 years ago

@Menelau StaticSelection works now.

But, now SIngleBest is not working.

Using the same file, you can check: https://www.dropbox.com/s/z4he7u1vmxppfy5/example_heterogeneous_static.ipynb?dl=0

Menelau commented 4 years ago

@jayahm Oh, I forgot to apply the last change on the SingleBest after changing the label encoder. Just merged a fix, you will need to update the library again. Sorry for that!

jayahm commented 4 years ago

@Menelau thanks again.

After trying on my data, I got this error:

                                                landscape_horizontal_session1_training_label,
     12                                                         landscape_horizontal_session1_validation_label,
---> 13                                                         landscape_horizontal_session1_test_label)
     14 
     15 average_result_landscape_horizontal_session1 = np.mean(result_landscape_horizontal_session1,axis=1)

<ipython-input-20-3ef7ee825dad> in train_test_model(training_set, validation_set, test_set, training_label, validation_label, test_label)
     84         single_best = SingleBest(pool_classifiers)
     85 
---> 86         model_stacked = stacked.fit(X_dsel, y_dsel)
     87         model_static_selection = static_selection.fit(X_dsel, y_dsel)
     88         model_single_best = single_best.fit(X_dsel, y_dsel)

~\Anaconda3\lib\site-packages\deslib\static\stacked.py in fit(self, X, y)
     63         super(StackedClassifier, self).fit(X, y)
     64 
---> 65         base_preds = self._predict_proba_base(X)
     66 
     67         # Prepare the meta-classifier

~\Anaconda3\lib\site-packages\deslib\static\stacked.py in _predict_proba_base(self, X)
    147 
    148         for index, clf in enumerate(self.pool_classifiers_):
--> 149             probabilities[:, index] = clf.predict_proba(X)
    150         return probabilities.reshape(X.shape[0],
    151                                      self.n_classifiers_ * self.n_classes_)

ValueError: could not broadcast input array from shape (68,2) into shape (68,1)
Menelau commented 4 years ago

@jayahm Hello,

Well that is a different problem and also on a different model (StackingClassifier). This one was not changed in the last patch.

Which classifiers are you using in the pool? From the error message, it seems that one of the base models in the pool is not providing probabilities in the correct format (it seems to be outputting a 1d array instead of a 2d).

jayahm commented 4 years ago

@Menelau I see. I thought it's coming from Stacking Classifier

But, I have other datasets with the same pool too. Other work.

Menelau commented 4 years ago

Can you provide me an code example that this error is happening? I'm trying to simulate this error but so far was not able to.

Also is this a very small dataset? That could possibly have a problem of one data split not containing training examples of all classes.

Menelau commented 4 years ago

By the way, as it is a diferent error, can you close this issue and open a new one? Having that in a different issue makes better for me to keep track of all problems and bugs that has been solve or is yet to be fixed. Thanks!

jayahm commented 4 years ago

@Menelau sure, I have created a new one.