scikit-learn / enhancement_proposals

Enhancement proposals for scikit-learn: structured discussions and rational for large additions and modifications
https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
47 stars 34 forks source link

WIP Dynamic pipelines #4

Closed code-of-kpp closed 5 years ago

amueller commented 8 years ago

btw, maybe you're interested in this gist: https://gist.github.com/amueller/643f812a275a9e0c75048aab6988a92c

code-of-kpp commented 8 years ago

Personally, I don't see anything good in binary ops for estimators. And duck-typing will not work.

amueller commented 8 years ago

Personally, I don't see anything good in binary ops for estimators. And duck-typing will not work.

Can you elaborate both points? The gist actually implements a way to incrementally built a pipeline, which I think is what you're trying to do here.

code-of-kpp commented 8 years ago

The problem is user-provided estimator doesn't have to be a subclass of BaseEstimator.

Even if somehow being a BaseEstomator will become a requirement for Pipeline steps explicit API for this should exist too

amueller commented 8 years ago

Can you give an example of a use-case that this solves that can't be solved right now? For example, when do you need the label processing within a pipeline?

code-of-kpp commented 8 years ago

One may want to do label encoding in the beginning of the pipeline, anywhere in the pipeline it may be necessary to apply some function to labels (to better suit following steps), clustering and classification can be chained into a pipeline (although at prediction stage first step, clustering, should be ignored)

Also, some examples are provided in the end of the document.