scikit-learn-contrib / DESlib

A Python library for dynamic classifier and ensemble selection
BSD 3-Clause "New" or "Revised" License
479 stars 106 forks source link

Kfold - TimeSeriesSplit? #222

Closed jmrichardson closed 3 years ago

jmrichardson commented 3 years ago

Hi, thank you for the great package. I have temporal data and would like to be able to use timeseriessplit cross validation or perhaps kfold (hold the shuffle). Is this possible?

Menelau commented 3 years ago

Hello,

Yes it is possible. In the case you can use the TimeSeriesSplit from sklearn to create your training and test split (and possibly validation too) and use these sets manually to train fit the base models & DS methods.

Another alternative is to have the DS method as input to the the cross_val_score function from scikit-learn to automatically compute the result over multiple folds. That functionality however, has a problem that it requires the pool of classifiers to be generated inside the DS method, instead of having a pool that you may already have trained before. That is a limitation of the scikit-learn cloning process, which cannot clone already trained models (See issue #89 ). They already have a plan to solve this issue on future updates.