kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
678 stars 112 forks source link

Spin off pipeline inspection to separate package #2163

Open astrojuanlu opened 2 days ago

astrojuanlu commented 2 days ago

Description

Kedro-Viz has done a lot of work to statically derive the structure of the pipeline, now without even having all the imports in place (see discussion in #1742, #1966)

The idea here is to split that functionality as a separate Python package that the Kedro Viz backend would depend on.

Context

There's growing evidence that this functionality could be useful for other use cases, for example to facilitate translating Kedro pipelines into other formats. Plugin authors probably know better, but I'm sure every translator plugin (think kedro-vertexai, kedro-mlrun, kedro-databricks) needs some form of pipeline inspection.[^1]

There's a proof of concept of how that could look like in https://github.com/AlpAribal/kedro-inspect, which was created as part of this research https://github.com/kedro-org/kedro/wiki/Synthesis-of-research-related-to-deployment-of-Kedro-to-modern-MLOps-platforms .

Possible Implementation

There's a strong indication that this process could use OpenLineage, specifically the concept of Static Lineage (admittedly not very well documented). Some earlier thoughts in https://github.com/kedro-org/kedro/discussions/4054

Possible Alternatives

Use a more ad-hoc format, more similar to whatever Kedro Viz is currently using, maybe even the output of --save-file (although at the moment it's not very clear what's expected there, see #1681).

Since there has been reluctance in the past towards spinning off packages, another solution could be that the functionality stays in Kedro Viz, and plugins depend on it. With the amount of dependencies Kedro Viz has, I really hope this isn't the preferred solution.

Another solution is to have that in pypi.org/p/kedro. I don't even think it would be too bad, since we're talking about exporting or serializing kedro.pipelines.pipeline.Pipeline objects in the end.

Very likely there are other possible solutions here, ideas welcome.

Since this is in the Kedro-Viz tracker, cc @merelcht for visibility.

Checklist

[^1]: This is a mere hypothesis. I think it would be good to sweep the different translator plugins and see if there is overlapping code among them cc @DimedS

astrojuanlu commented 2 days ago

(It seems like I'm solutionising here, please don't take it as such - it's more of a brain dump after an exceedingly long yak shaving session)