scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://imbalanced-learn.org
MIT License
6.85k stars 1.29k forks source link

[ENH] Pipeline partial transform #976

Closed elliot-hallmark closed 1 year ago

elliot-hallmark commented 1 year ago

It's helpful to evaluate a pipeline up to a certain point and then inspect the output. In Caffe this is achieved with the kwarg end. In my proposal if a pipeline did augmentation, feature interaction, scaling, anova feature selection, and then classification, you could run pipeline.transform(X, end='anova') and get the transformed data prior to classification.

This if helpful in investigating the state of data deeper into the pipeline in order to think how to improve your pipeline

glemaitre commented 1 year ago

We are inheriting from scikit-learn and thus the feature should be proposed upstream. Be aware that you can do what you want with indexing:

# call transform on all steps but the last one
pipeline[:-1].transform(X)