Closed jayahm closed 4 years ago
UPDATE:
If I used BaggingClassifier
as the pool of classifiers, the StaticSelection
worked.
But, for heterogeneous pool, the issue happened.
UPDATE:
I included StaticSelection
in your original example_heterogeneous
example using Jupyter.
I got this error:
'<' not supported between instances of 'str' and 'int'
You may see the file: https://www.dropbox.com/s/z4he7u1vmxppfy5/example_heterogeneous_static.ipynb?dl=0
@jayahm Hello,
Sorry for the late response, I believe that is a bug in the way we are handling label encoding in this estimator. I will investigate this issue right now and get back to you asap.
Hi, is there any update on this?
Hey, i'm also waiting for an update in this.
@jayahm , @PedroSilvaAlves Hello,
Sorry for the very late response, I just came back from vacations. I took a look at the issue and it is really related to problems while performing label encoding inside the static selection method. I started working on it on a new branch and it seems the problem is solved. I just need to perform more tests to see if everything is ok and will probably send a pull request with a fix tomorrow.
Meanwhile, if you want you can check the branch that I'm fixing this issue: https://github.com/scikit-learn-contrib/DESlib/tree/fix_label_encoder
@Menelau, thank you. you works have helped me a lot. is the problem has been solved?
@jayam Hello,
The patch with the fix is almost done. The code example you provided that was giving an error before is working fine now as well as other test cases I created to ensure the problem is solved.
Now I just need to make sure that these changes did not break the scikit-learn check_estimators
test in order to merge the patch. I'm looking at it right now.
@jayahm,
I merged the fix to the master branch. Note that in order to get the update you need to reinstall the library using the latest version: pip install git+https://github.com/scikit-learn-contrib/DESlib
please let me know if everything works for you.
@Menelau , great, let me try
@Menelau StaticSelection works now.
But, now SIngleBest is not working.
Using the same file, you can check: https://www.dropbox.com/s/z4he7u1vmxppfy5/example_heterogeneous_static.ipynb?dl=0
@jayahm Oh, I forgot to apply the last change on the SingleBest after changing the label encoder. Just merged a fix, you will need to update the library again. Sorry for that!
@Menelau thanks again.
After trying on my data, I got this error:
landscape_horizontal_session1_training_label,
12 landscape_horizontal_session1_validation_label,
---> 13 landscape_horizontal_session1_test_label)
14
15 average_result_landscape_horizontal_session1 = np.mean(result_landscape_horizontal_session1,axis=1)
<ipython-input-20-3ef7ee825dad> in train_test_model(training_set, validation_set, test_set, training_label, validation_label, test_label)
84 single_best = SingleBest(pool_classifiers)
85
---> 86 model_stacked = stacked.fit(X_dsel, y_dsel)
87 model_static_selection = static_selection.fit(X_dsel, y_dsel)
88 model_single_best = single_best.fit(X_dsel, y_dsel)
~\Anaconda3\lib\site-packages\deslib\static\stacked.py in fit(self, X, y)
63 super(StackedClassifier, self).fit(X, y)
64
---> 65 base_preds = self._predict_proba_base(X)
66
67 # Prepare the meta-classifier
~\Anaconda3\lib\site-packages\deslib\static\stacked.py in _predict_proba_base(self, X)
147
148 for index, clf in enumerate(self.pool_classifiers_):
--> 149 probabilities[:, index] = clf.predict_proba(X)
150 return probabilities.reshape(X.shape[0],
151 self.n_classifiers_ * self.n_classes_)
ValueError: could not broadcast input array from shape (68,2) into shape (68,1)
@jayahm Hello,
Well that is a different problem and also on a different model (StackingClassifier). This one was not changed in the last patch.
Which classifiers are you using in the pool? From the error message, it seems that one of the base models in the pool is not providing probabilities in the correct format (it seems to be outputting a 1d array instead of a 2d).
@Menelau I see. I thought it's coming from Stacking Classifier
But, I have other datasets with the same pool too. Other work.
Can you provide me an code example that this error is happening? I'm trying to simulate this error but so far was not able to.
Also is this a very small dataset? That could possibly have a problem of one data split not containing training examples of all classes.
By the way, as it is a diferent error, can you close this issue and open a new one? Having that in a different issue makes better for me to keep track of all problems and bugs that has been solve or is yet to be fixed. Thanks!
@Menelau sure, I have created a new one.
I was trying to run StaticSelection on my dataset using Jupyter Notebook.
I used loop to run on 40 different datasets (1 dataset = 1 subject).
For other classifiers (SVM, KNN, etc), nothing was wrong.
But for StaticSelection, I got the error below.
It seems it has an issue with the test set and its label.
Why only this happened to
StaticSelection
(and alsoSingleBest
)?