kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

Flexible node selection in `kedro run` command line syntax #3949

Open ianwhale opened 2 weeks ago

ianwhale commented 2 weeks ago

Description

I would like to be able to run nodes across namespaces in a more flexible way.

Context

Let's say I have three identical nodes across three namespaces: red.node, blue.node, yellow.node.

If I want to run them all, my only option is: kedro run --nodes red.node,blue.node,yellow.node

As number of namespaces increase this gets unweildy.

Possible Implementation

Add bash-style or dataset factory style wildcarding to the run command:

Possible Alternatives

Suggestion from @datajoely:

DBT style syntax for inclusion / exclusion / etc: https://docs.getdbt.com/reference/node-selection/syntax

datajoely commented 2 weeks ago

Related - https://github.com/kedro-org/kedro/issues/2552

astrojuanlu commented 2 weeks ago

Is this related too? #3679