kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.94k stars 903 forks source link

`kedro pipeline create` pipeline.py with missing import #3557

Open noklam opened 9 months ago

noklam commented 9 months ago

Description

A user was facing this problem:

Brandon Meek 1 hour ago Hey everyone, can someone help me understand why node was removed from the boilerplate imports in pipeline.py? It seems like it's necessary? (edited)

Brandon Meek 26 minutes ago Sorry, I meant from the boilerplate created after running kedro pipeline create ...

Context

"""
This is a boilerplate pipeline 'nok'
generated using Kedro 0.19.2
"""

from kedro.pipeline import Pipeline, pipeline

The current CLI created this file, it used to import node also but was removed when we introduced ruff.

The problem is because node is not used so the linter pick it up. We should either have a #noqa or have linter skip the template file.

Expected Result

Expectation is that kedro pipeline create creates a runnable pipeline with the correct imports.

Actual Result

The node import is missing.

Your Environment

astrojuanlu commented 9 months ago

+1000 this is a very common annoyance.

Aside from # noqa, another idea is to supplement a dummy pipeline? For example

def create_pipeline(**kwargs) -> Pipeline:
    return pipeline([
        node(
            func=...,
            inputs=...,
            outputs=...,
        )
    ])
noklam commented 9 months ago

@astrojuanlu It sounds good to me at first, but then I think it may causes issues because of pipeline autodiscovery. As soon as someone do kedro pipeline create, their __default__ pipeline will get a random dummy node.

astrojuanlu commented 9 months ago

For context, this comes from

https://github.com/kedro-org/kedro/blob/e58878124804fa93d394a895fdee11c3c5b9cc5e/kedro/templates/pipeline/%7B%7B%20cookiecutter.pipeline_name%20%7D%7D/pipeline.py#L1-L6

Let's do a quick exploration of how the UX is like in the case of a dummy node, if it's too bad or confusing let's settle on # noqa, but @deepyaman pointed out that this will remain in users code.

astrojuanlu commented 9 months ago

Another option, is to have

 from kedro.pipeline import Pipeline, pipeline  # , node
astrojuanlu commented 3 months ago

Another user complained about this today