Open foxale opened 1 year ago
+1 that this is annoying. I remember in the past, when had nodes with hard-to-install dependencies (e.g. a node that called R code using rpy2
, because epidemiologists don't write their libraries in Python), would comment out imports in order to get kedro-viz
working locally.
What I recall:
kedro-viz
is going to trawl through your pipeline/node definitions, so if you have imports at the top of files, they need to be resolvable. You could move the imports into nodes, but that's ugly.kedro-mlflow
, I wonder when (in the project lifecycle) is it making a call to set up the connection? Maybe it can be delayed as much as possible. I don't think kedro-viz
can entirely ignore hooks, because some could modify the DAG, but at the same time just loading the kedro-mlflow
plugin maybe shouldn't create a connection. Would need to look further into how this works.Transferred this issue to the Kedro-Viz repo, because it's not framework related.
Hi @foxale, Unfortunately I don't have a solution to your problem better than "turn off the mlflow hook".
I feel a bit responsible about this issue, both because I am the maintainer of the kedro-mlflow plugin, and because the root cause of this is due to a bunch of issues and suggestions I made to the core framework. I will try to explain what's going on here and the rationale behind this behaviour.
In these issues: https://github.com/kedro-org/kedro/issues/506 and https://github.com/kedro-org/kedro/issues/1431, there was a huge discussion about enabling hooks to access the config_loader
. The pros and cons are discussed in the issues and I won't sum everything up here, but the key idea is that many plugins and hooks need to access proprietary config files or credentials.
After these discussion we decided to introduce an after_context_created
hook in https://github.com/kedro-org/kedro/pull/1434, and this is described in https://github.com/kedro-org/kedro/issues/1458. This hook exposes the entire context to solve a lot of hook/plugins authors complaints (see the different discussions to understnad the inner details). One of the main argument was the need to be able to instantiate an external connection "once for all" at the beginning of the pipeline (e.g. for spark or mlflow).
The main advantage of creating connection after context creation is that when you are using the session interactively (in a script of a jupyter notebook), this is done automatically: you don't have to setup manually all connections.
On the other hand, the biggest drawback is that this connection is always instantiated, including when calling kedro viz
(which loads a session). This was identified and discussed in this comment : https://github.com/kedro-org/kedro/issues/506#issuecomment-1100099112. @AntonyMilneQB and I concluded that this is a very uncommon scenario (it means that you run kedro-mlflow on a machine which cannot instantiate the mlflow connection: by definition, this can't be the machine where you launch kedro run
because if you installed kedro-mlflow it was exactly because you wanted to have an mlflow connection. Since kedro-viz is a package that is more likely to be used in a development rather a production environment and considering the advantages, we did not investigate further.
Maybe @noklam or @AntonyMilneQB of the core team can add furhter help, and eventually suggest a solution ?
all plugins to be working (incl. mlflow connection to remote tracking server if leveraged) all python packages to be installed
@foxale Thank you for your question. The answer to that is pretty much covered. tl;dr
Well, it still looks like a fixable problem on the kedro-viz
side. One solution would be to add a parameter to kedro viz that would skip all import errors and after_catalog_created
hooks and render the "raw" DAG.
@foxale would you feel comfortable opening a PR that tries to solve this problem? We'd certainly appreciate it, as we welcome any and all contributions from the community!
This is certainly doable if the team thinks we should do this. It's not the most elegant solution you can find, but I think simply
do something like
if SOME_FLAG:
session._hook_manager = _NullPluginManager()
# Before this line
context = session.load_context()
This will remove all the hooks and should avoid this problem.
Potential solution for this: https://github.com/kedro-org/kedro-viz/issues/1459#issuecomment-1662165936
Reopening this issue, as there's some issues around working with Spark that --ignore--plugins
does no resolve
Description
So according to this https://github.com/kedro-org/kedro-viz/issues/1125 kedro-viz requires
For me it just doesn't make sense. Why would I need to have the python env ready to visualize project flow? The original issue seems dead, but kedro-viz is still in some way a core kedro feature, so..