scikit-learn-contrib / DESlib

A Python library for dynamic classifier and ensemble selection
BSD 3-Clause "New" or "Revised" License
479 stars 106 forks source link

Ensembling models that use different sklearn preprocessing pipelines #191

Closed AlessandroCortex closed 4 years ago

AlessandroCortex commented 4 years ago

My understanding is that DESlib works with several classifiers but one data preprocessing pipeline. Is there a way to use unique pipelines for each classifier?

Menelau commented 4 years ago

@AlessandroCortex Hello,

Sorry for the late response. I spent some time away but now I'm back and have more time to dedicate to the library.

Each base model in the pool can have its own pipeline with different data pre-processing steps (as long as it is a sklearn pipeline). For example:

`from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler from sklearn.datasets import make_classification from sklearn.neighbors import KNeighborsClassifier, NeighborhoodComponentsAnalysis from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipeline X, y = make_classification(n_samples=1000, random_state=0) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

pipe_svc = Pipeline([('scaler', StandardScaler()), ('svc', SVC())]) dt = DecisionTreeClassifier(random_state=0) pipe_knn = Pipeline([('scaler', StandardScaler()), ('nca', NeighborhoodComponentsAnalysis(random_state=0)), ('knn', KNeighborsClassifier(n_neighbors=3))]) pipe_svc.fit(X_train, y_train) dt.fit(X_train, y_train) pipe_knn.fit(X_train, y_train) pool_classifiers = [pipe_svc, dt, pipe_knn]

from deslib.dcs import OLA from deslib.des import KNORAE

ola = OLA(pool_classifiers).fit(X_train, y_train) knorae = KNORAE(pool_classifiers).fit(X_train, y_train)

print("OLA score: {}" .format(ola.score(X_test, y_test))) print("KNORAE score: {}" .format(knorae.score(X_test, y_test))) print("SVM score: {}" .format(pipe_svc.score(X_test, y_test)))`

This code works normally as each pipeline is viewed by the dynamic selection techniques as one base model implementing the sklearn interface.

This is a point I need to emphasize in the documentation as well as in the examples. I will add this information to the documentation.