Performance of DS methods not same when the pool is changing

scikit-learn-contrib / DESlib

A Python library for dynamic classifier and ensemble selection

BSD 3-Clause "New" or "Revised" License

479 stars 106 forks source link

Performance of DS methods not same when the pool is changing #226

Closed jayahm closed 3 years ago

jayahm commented 3 years ago

Hi,

I used the same DS method, but with different pools.

What happened was, the DS methods didn't show the same performance in term of ranking.

For example, DS Method X, was the best on Pool A but not Pool B.

Menelau commented 3 years ago

That is normal, the performance will always depend on the pool and there will always be change in performance by changing the pool. That's why it is important to estimate an average performance by running multiple simulations like cross-validation or by doing multiple hold-out splits. And measure whether the difference in performance is statistically significant or not with the proper statistical tools.

jayahm commented 3 years ago

OK. But, how do we reach conduction, which method is the best?

I ran both heterogeneous and homogenous pool of classifiers, but, I cannot make a conclusion which is better.

Also, I saw many papers that proposed new DS methods where can their method can outperform other existing methods eventhough the nature of datasets used were diverse. How could a particular method show superiority in this case?

Any idea on what statistical test can be used when this happens?

Menelau commented 3 years ago

You should check the machine learning literature about model selection and how to compare learning algorithms for that. Some suggesting readings are:

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning: https://arxiv.org/pdf/1811.12808
Statistical comparisons of classifiers over multiple data sets: http://www.jmlr.org/papers/volume7/demsar06a/demsar06a.pdf
Japkowicz, Nathalie, and Mohak Shah. Evaluating learning algorithms: a classification perspective. Cambridge University Press, 2011.

Since this space is to report bugs and discuss changes/new features in the library and not about the comparison of models or how to properly use statistical tests in mahcine learning I'm closing this issue.

jayahm commented 3 years ago

Thanks! I appreciate that