Can't use Pipeline with dynamic selection methods

AssiaBou commented 5 years ago

when i used pipeline as a baseclassifier with dynamic classifier techniques (KNORAU, KNORAE and META-DES) I had this error "'Pipeline' Object is not iterable' what does it means" this is my code: pool_PMC_classifiers = BaggingClassifier(model_perceptron1, n_estimators=10, random_state=rng) SMOTE = SMOTE(random_state=rng) pipe = make_pipeline(SMOTE, pool_PMC_classifiers) pipe.fit(X_train, y_train) knu = KNORAU(pipe) knu.fit(X_dsel, y_dsel)

Menelau commented 5 years ago

@AssiaBou Hello, I believe in this case wrapping up an ensemble method (BaggingClassifier) inside a pipeline it is viewed as a single estimator instead of a list, therefore it is not iterable.

In the case, DESLib receives as input a list of classifiers as the pool of classifiers. So instead of passing the whole pipeline as input, you can retrieve the BaggingClassifier that was trained together with SMOTE in the pipeline:

fitted_pool_classifiers = pipe['baggingclassifier'] and then pass it as input to the dynamic ensemble selection model:

knu = KNORAU(fitted_pool_classifiers)

AssiaBou commented 5 years ago

Hello sir, I have tried the solution that you proposed to me and the mistake has been gone, but results before applying sampling methodes with DES techniques are better than after applying them

if you have any idea about this can you help me please

regards, Assia

Menelau commented 5 years ago

Well, in this case, there are two things to take into consideration:

1) The performance metric used. We found that the use of oversampling significantly improve the result with respect to the F-measure and G-mean. We did not obtain good results with AUC.

2) We use a different methodology in order to use oversampling methods with dynamic selection. Instead of applying the oversampling method to the dataset and then training the base classifiers using Bagging, we first create N bootstraps of the training data. Then, we apply the oversampling method for each bootstrap separately and use the resulting data to train a base classifier. That is a very important step as due to the randomness in the bootstrapping and oversampling process the final pool of classifiers are more diverse.

Unfortunately, I don't think you can replicate that methodology using the sklearn pipelines and the BaggingClassifier estimator. In our case, we always created the bootstraps manually (for that you can use the sklearn.utils.resample function). Then for each bootstrap apply the SMOTE and train a base model.

Note that we also recommend applying the oversampling method to the dynamic selection dataset (the data used by the DS algorithm for estimating the competence level of the base classifiers dynamically). The whole methodology we recommend is presented in the paper: Roy, Anandarup, Rafael MO Cruz, Robert Sabourin, and George DC Cavalcanti. "A study on combining dynamic selection and data preprocessing for imbalance learning." Neurocomputing 286 (2018): 179-192.

In particular, Figure 1 in this paper presents the methodology we recommend.

Moreover, I also recommend checking the paper: Díez-Pastor, José F., Juan J. Rodríguez, César I. García-Osorio, and Ludmila I. Kuncheva. "Diversity techniques improve the performance of the best imbalance learning ensembles." Information Sciences 325 (2015): 98-117.

Which motivated us to follow this methodology for the training of the base model.

scikit-learn-contrib / DESlib

Can't use Pipeline with dynamic selection methods #152