scikit-learn-contrib / skdag

A more flexible alternative to scikit-learn Pipelines
MIT License
30 stars 8 forks source link

Build graph from expressive operators #31

Open big-o opened 10 months ago

big-o commented 10 months ago

Rather than a factory method, allow users to construct a graph by applying operators to estimators, for example:

dag1 = (
    NamedStep(est1, "step1")
    | NamedStep(est2, "step2")
    | (
        NamedStep(est3, "step3")
        & NamedStep(est4, "step4")
    )
    | NamedStep(est5, "step5")
)

This would create a linear pipeline from est1 -> est5, but with the second step feeding both steps 3 and 4, and step 5 receiving a concatenation of the two outputs from steps 3 and 4.

Complex dag construction can then be broken down into multiple statements too:

dag2 = (
    dag1.get_step("step2")
    | NamedStep(est6, "step6")
    | NamedStep(est7, "step7")
)

...would effectively create a new dag that is the original one with an extra branch added.