kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
10.04k stars 908 forks source link

`pipeline.pipeline` confuses IDE autocompletion #2805

Open noklam opened 1 year ago

noklam commented 1 year ago

Description

We import pipeline in our starter and this confuse IDE like VSCode, as it thinks pipeline is a module instead of the pipeline function.

from kedro.pipeline import Pipeline, node, pipeline

Changing it to the following avoid the issue.

from kedro.pipeline.modular_pipeline import pipeline image

The problem here is IDE confuse kedro.pipeline.modular_pipeline as kedro.pipeline.pipeline module.

Context

Solution 1:

Solution 2:

Solution 3:

Expected Result

Actual Result

-- If you received an error, place it here.
-- Separate them if you have more than one.

Your Environment

astrojuanlu commented 1 year ago

or

from kedro import Pipeline

Pipeline.make(...)

😄 #712

(see also #2723 about "modular pipelines" vs just "pipelines")

noklam commented 1 year ago

Added 0.19 tag to prompt for discussion in case we need to move stuff around. (It's annoying Github project change doesn't allow comment🥲)

noklam commented 1 year ago

Discussed in backlog grooming. Will keep this as a low priority for the 0.19 release and see if we have time to fit this in.

If we end up don't want to break anything in 0.19, we should still update the starters to avoid these syntax highlight problem by importing from kedro.pipeline.modular_pipeline directly.

astrojuanlu commented 1 year ago

There's more confusion: https://github.com/kedro-org/kedro-viz/issues/1522#issue-1885805761

Some users are creating the pipelines with the Pipeline class.

I think we should tackle this and #2723 at once.

astrojuanlu commented 1 year ago

I just saw this code snippet in someone else's code:

my_pipeline = Pipeline([
    Node(filter_func, "df_input", "df_filtered"),
    Node(actual_func, "df_filtered", "df_output"),
])

is there a reason why we are using the pipeline and node helpers instead of the Pipeline and Node class initializers directly?