Kedro-viz needs too much to run properly

foxale commented 1 year ago

Description

So according to this https://github.com/kedro-org/kedro-viz/issues/1125 kedro-viz requires

all plugins to be working (incl. mlflow connection to remote tracking server if leveraged)
all python packages to be installed

For me it just doesn't make sense. Why would I need to have the python env ready to visualize project flow? The original issue seems dead, but kedro-viz is still in some way a core kedro feature, so..

deepyaman commented 1 year ago

+1 that this is annoying. I remember in the past, when had nodes with hard-to-install dependencies (e.g. a node that called R code using rpy2, because epidemiologists don't write their libraries in Python), would comment out imports in order to get kedro-viz working locally.

What I recall:

kedro-viz is going to trawl through your pipeline/node definitions, so if you have imports at the top of files, they need to be resolvable. You could move the imports into nodes, but that's ugly.
Re kedro-mlflow, I wonder when (in the project lifecycle) is it making a call to set up the connection? Maybe it can be delayed as much as possible. I don't think kedro-viz can entirely ignore hooks, because some could modify the DAG, but at the same time just loading the kedro-mlflow plugin maybe shouldn't create a connection. Would need to look further into how this works.

merelcht commented 1 year ago

Transferred this issue to the Kedro-Viz repo, because it's not framework related.

Galileo-Galilei commented 1 year ago

Hi @foxale, Unfortunately I don't have a solution to your problem better than "turn off the mlflow hook".

I feel a bit responsible about this issue, both because I am the maintainer of the kedro-mlflow plugin, and because the root cause of this is due to a bunch of issues and suggestions I made to the core framework. I will try to explain what's going on here and the rationale behind this behaviour.

In these issues: https://github.com/kedro-org/kedro/issues/506 and https://github.com/kedro-org/kedro/issues/1431, there was a huge discussion about enabling hooks to access the config_loader. The pros and cons are discussed in the issues and I won't sum everything up here, but the key idea is that many plugins and hooks need to access proprietary config files or credentials.

After these discussion we decided to introduce an after_context_created hook in https://github.com/kedro-org/kedro/pull/1434, and this is described in https://github.com/kedro-org/kedro/issues/1458. This hook exposes the entire context to solve a lot of hook/plugins authors complaints (see the different discussions to understnad the inner details). One of the main argument was the need to be able to instantiate an external connection "once for all" at the beginning of the pipeline (e.g. for spark or mlflow).

The main advantage of creating connection after context creation is that when you are using the session interactively (in a script of a jupyter notebook), this is done automatically: you don't have to setup manually all connections.

On the other hand, the biggest drawback is that this connection is always instantiated, including when calling kedro viz (which loads a session). This was identified and discussed in this comment : https://github.com/kedro-org/kedro/issues/506#issuecomment-1100099112. @AntonyMilneQB and I concluded that this is a very uncommon scenario (it means that you run kedro-mlflow on a machine which cannot instantiate the mlflow connection: by definition, this can't be the machine where you launch kedro run because if you installed kedro-mlflow it was exactly because you wanted to have an mlflow connection. Since kedro-viz is a package that is more likely to be used in a development rather a production environment and considering the advantages, we did not investigate further.

Maybe @noklam or @AntonyMilneQB of the core team can add furhter help, and eventually suggest a solution ?

noklam commented 1 year ago

all plugins to be working (incl. mlflow connection to remote tracking server if leveraged) all python packages to be installed

@foxale Thank you for your question. The answer to that is pretty much covered. tl;dr

Viz needs to load up the pipeline/nodes object, so the files would be loaded and the dependencies need to be installed. I don't see any way to get around it since you need to execute the code to get the Python representation of the pipeline.
In most case - plugins aren't relevant, but as @deepyaman mentioned, the hook can actually change the DAG, so viz need to execute that. Otherwise, we can skip all the hooks and it would avoid the mlflow connection problem.

foxale commented 1 year ago

Well, it still looks like a fixable problem on the kedro-viz side. One solution would be to add a parameter to kedro viz that would skip all import errors and after_catalog_created hooks and render the "raw" DAG.

tynandebold commented 1 year ago

@foxale would you feel comfortable opening a PR that tries to solve this problem? We'd certainly appreciate it, as we welcome any and all contributions from the community!

noklam commented 1 year ago

This is certainly doable if the team thinks we should do this. It's not the most elegant solution you can find, but I think simply

do something like

if SOME_FLAG:
    session._hook_manager = _NullPluginManager()
# Before this line
context = session.load_context()

This will remove all the hooks and should avoid this problem.

https://github.com/kedro-org/kedro-viz/blob/cc11edbec0777329eb66c875529ed35cfc5d7256/package/kedro_viz/integrations/kedro/data_loader.py#L72-L91

astrojuanlu commented 10 months ago

Potential solution for this: https://github.com/kedro-org/kedro-viz/issues/1459#issuecomment-1662165936

rashidakanchwala commented 4 months ago

Reopening this issue, as there's some issues around working with Spark that --ignore--plugins does no resolve

kedro-org / kedro-viz

Kedro-viz needs too much to run properly #1159

Description