Inconsistent performance of DS methods

scikit-learn-contrib / DESlib

A Python library for dynamic classifier and ensemble selection

BSD 3-Clause "New" or "Revised" License

479 stars 106 forks source link

Inconsistent performance of DS methods #225

Closed jayahm closed 3 years ago

jayahm commented 3 years ago

Hi,

I run some experiments on multiple datasets using several DS methods.

I just got confused that why the performance of each DS methods is not consistent?

For example, sometimes rank X and sometimes rank Y, sometimes rank Z (if the best, not always the best om all datasets).

This made m hard to make a conclusion.

Is this normal?

Menelau commented 3 years ago

Yes, it is normal. It is the non-free lunch theorem, the best model will depend on the dataset.

That's why having the appropriate simulation like using cross-validation to estimate performance averages, as well as using proper statistical tests for comparison between multiple machine learning models, is very important.

jayahm commented 3 years ago

Can you explain more on the cross-validation part and statistical test?

I mean, not on how to do it. But, on how these two can be helpful for analysis in the case the performance is not consistent?

Menelau commented 3 years ago

Unfortunately, I can't since it is a very long subject, with plenty of nuances to cover, and here is not the place for that (especially since it is also completely out of the scope from this project). I can however suggest some readings:

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning: https://arxiv.org/pdf/1811.12808
Statistical comparisons of classifiers over multiple data sets: http://www.jmlr.org/papers/volume7/demsar06a/demsar06a.pdf
Japkowicz, Nathalie, and Mohak Shah. Evaluating learning algorithms: a classification perspective. Cambridge University Press, 2011.

jayahm commented 3 years ago

Thanks! I appreciate that