kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
650 stars 107 forks source link

Investigate the minimum requirements for Kedro-Viz to run #1783

Open NeroOkwa opened 4 months ago

NeroOkwa commented 4 months ago

Description

The goal of this ticket is to investigate the minimum requirements for Kedro-Viz to run and display the DAG. Kedro-Viz needs to load the pipeline/nodes object, and the dependencies installed, before the code is executed and the DAG is generated.

Context

Investigating and minimising the requirements to run Kedro-Viz is important because it reduces the number of steps for a user to install and use Kedro-Viz. Several steps and error messages to load Kedro-Viz (especially the first time) could lead to attrition and hence low adoption.

When this is improved users can use Kedro-Viz from the early stages of their pipeline development to see what their kedro project structure looks like, and iterate further.

Evidence Markers

Copying over the comments from #1742:

Another example, @inigohidalgo says "due to the heavy deps from viz i usually have my dev venv but I create another one just for viz where i just install viz over whatever project I have installed, overriding the project's dependencies with viz's" and asks "do you know if anybody has tested using kedro viz as an "app", so installing it through pipx or smth similar? is that even possible with how viz works?". https://linen-slack.kedro.org/t/16380121/question-regarding-kedro-viz-why-is-there-a-restriction-on-p#38213e99-ba9d-4b60-9001-c0add0e2555b

The acceptance criteria for this is simple - As a user I shouldn't need a full Spark installation to view Kedro-Viz for a project which uses Spark to process data.

A user also mentioned a similar issue in #1159 when using the kedro-mlflow plugin. Copying over the comments:

So according to this https://github.com/kedro-org/kedro-viz/issues/1125 kedro-viz requires all plugins to be working (incl. mlflow connection to remote tracking server if leveraged) all python packages to be installed For me it just doesn't make sense. Why would I need to have the python env ready to visualize project flow? The original issue seems dead, but kedro-viz is still in some way a core kedro feature, so..

In most case - plugins aren't relevant, but as @deepyaman mentioned, the hook can actually change the DAG, so viz need to execute that. Otherwise, we can skip all the hooks and it would avoid the mlflow connection problem.

Well, it still looks like a fixable problem on the kedro-viz side. One solution would be to add a parameter to kedro viz that would skip all import errors and after_catalog_created hooks and render the "raw" DAG.