The shape of test samples and neighbours are not the same

jayahm commented 4 years ago

Hi,

Upon checking the shape of y_test in the examples/example_heterogeneous.py, it is (228,).

But, I checked the shape of neighbors (in ola.py), the shape is (66, 7). Since 7 the number of neighbours, I believe 66 is the number of test samples.

I wonder why this could be different? (228 vs 66)

Menelau commented 4 years ago

Because thr DS methos are only active when there is a disagreement between the base classifiers in the pool. If all classifiers predict the same label for the test there is no reason to use DS at the end since the prediction will always be the same.

In this case, 66 is the number of test samples that the base models disagree with the label, so that they are passed down to the DS method, and only the neighborhood for these 66 examples are calculated.

On Sun, Aug 16, 2020, 21:34 jayahm notifications@github.com wrote:

Hi,

Upon checking the shape of y_test in the examples/example_heterogeneous.py, it is (228,).

But, I checked the shape of neighbors (in ola.py), the shape is (66, 7). Since 7 the number of neighbours, I believe 66 is the number of test samples.

I wonder why this could be different? (228 vs 66)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/DESlib/issues/207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6SFZCSLG7RFCT2C7D5SJLSBCCITANCNFSM4QBENEYQ .

jayahm commented 4 years ago

But, in the end, the DS will return the labels of all samples that including those that have agreements between the base classifiers, am I right?

Menelau commented 4 years ago

yes, it returns the label of all samples.

jayahm commented 4 years ago

May I know what criteria a particular method uses for the "agreement"?

I believe it is based on the same label returned by each classifier.

In that case, there is a default classification decsion threshold for the classifiers.

What if in some classification problems, the threshold is varied? (that is in my case)

Menelau commented 4 years ago

Just checking the prediction of the base models. If all models predict the same label, no need for further processing with DS as any selection will give the same output.

If at least one model disagrees with the other base classifiers, dynamic selection is used to select the most competent one(s) for prediction.

In the current implementation, there is no threshold for the degree of disagreement, since our goal was just to reduce the computational cost, but without changing the definitions at all of any of the dynamic selection methods. If you want to have something that changes according to the level of disagreement in the predicted labels (e.g., if 30% of the classifier disagrees with the rest) you would need to modify the code yourself in order to allow such functionality.

scikit-learn-contrib / DESlib

The shape of test samples and neighbours are not the same #207