Closed Galileo-Galilei closed 6 months ago
Thanks for this very clear write up! I'm wondering how this would fit into current ideas around improving the jupyter/ipython workflow (as per 0.17.5 release notes, we have removed .ipython/profile_default/startup/
from the Kedro project template in favour of .ipython/profile_default/ipython_config.py
and kedro.extras.extensions.ipython
). @idanov
add documentation on how to create a custom line magic in your plugin, because this feature is undocumented yet. I had to read the code to discover it and I struggled a little to make it work². I can make a PR, maybe by adding a sub-section in the plugin section?
This is an excellent idea I think. As far as I can see the kedro.line_magic
entry point is indeed completely undocumented and would fit in well on that page. I've actually been meaning to make that page a bit clearer for a while, but if you did want to add a subsection for how to write custom line magic then that would be awesome.
add the possibility to execute some functions / push global variables to the notebook session when it is openened via a Kedro CLI command
Just to understand, this would effectively give users the ability to add their own custom code that would be executed upon running %reload_kedro
?
Yes, I've seen how you refactored the ipython part for the next release and it seems much simpler like this.
Regarding the documentation, I agree that apart from the line_magic
, the plugin section is a bit sparse and may need more details for beginners (I often find myself digging in the code to see how things are implement under the hood, or look at kedro-viz
as an example). I'd try to suggest some improvements when I have time but this is not a top priority for me to be honest.
Just to understand, this would effectively give users the ability to add their own custom code that would be executed upon running %reload_kedro?
Yes. The current workflow for a user is:
kedro jupyter notebook
%reload_kedro_mlflow
inside a cell%reload_kedro
%reload_kedro_mlflow
I wish it were:
kedro jupyter notebook
%reload_kedro
inside a cellIt is honestly not a big deal and the first workflow is quite satisfying, I just think that it may be more user friendly to hide the "kedro-mlflow" part from the user.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Somehow it's nearly a year now since this was posted, but I have some fresh thoughts on it now! For context, we've been working on the kedro-viz %run_viz
line magic recently (https://github.com/kedro-org/kedro-viz/pull/1012). Now that I understand more about how these line magics work I've realised that our current setup is not great.
tl;dr: let's go for the proposal under Possible Alternatives above and remove the kedro.line_magic
entrypoint altogether. It's very weak and doesn't add anything useful since we can achieve everything we want just through %load_ext
already.
%load_ext kedro.extras.extensions.ipython
(soon to be aliased to just %load_ext kedro
: see https://github.com/kedro-org/kedro/issues/1763)kedro jupyter/ipython
but can also be run on any managed instance to "kedro-ify" your jupyter/ipython session. The behaviour is identical; all kedro jupyter/ipython
does is load the extension automaticallykedro.line_magic
, which according to a github search is used in publicly available repos precisely two times: kedro-mlflow and kedro-viz. I can see one old usage of it in an internal package too, and @Galileo-Galilei suggests he uses it internally too. But it's certainly hardly used at allneeds_local_scope
so that all line magics must have the variable local_ns
in their signature, even if they don't want it. It should be the responsibility of the package defining the line magic to specify decorators like needs_local_scope
; it shouldn't be forced on them by kedro%run_viz
and as such was never documented. It's not powerful enough to be used for general cases. It's probably only @Galileo-Galilei who realised that it existed and used itIf we can agree a good plan with @Galileo-Galilei then I would even be happy to do this as a non-breaking release, after doing a user survey to double check that no one else is using it.
Here's the requirements:
%load_ext kedro
is run (or maybe when reload_kedro
is run, not sure)As far as I can tell, satisfying requirement 1 is trivial because this is exactly what's already provided by IPython extensions! If you write an IPython extension then it can then do whatever you like - no need to be constrained like you currently are to just writing line magics with local_ns
in their signature.
The user flow in a managed jupyter instance would then be:
%load_ext kedro
only needed if you're not in the project directory or want to change env etc.: %reload_kedro <path>
%load_ext kedro_mlflow
%load_ext kedro_viz
This is basically the Possible Alternatives proposal from @Galileo-Galilei above. The crucial thing is that I believe this statement is not correct:
However, we cannot access the objects created by reload_kedro (especially the session or the catalog if we want to modify /use them). We need to duplicate almost all the code to make it works, and it prevents combining several plugins.
So long as %reload_kedro
has run (which is called by %load_ext kedro
) then these variables are exactly what's available inside local_ns
. i.e. you can do local_ns["catalog"]
inside your kedro-mlflow line magic. Probably you don't even need local_ns
and could get these out of get_ipython()
somewhere.
If we wanted to, this could be achieved by introducing a new, more general IPython entrypoint that runs when %load_ext
or %reload_kedro
is run. Or probably better, as in the original proposal, a hook. Technically this doesn't need to take any arguments since you can always do get_ipython
, but probably for convenience we would expose catalog
etc.
However, introducing this hook feels like a big addition for what would be a small increase in convenience for a small and infrequently used feature (basically just @Galileo-Galilei + kedro-viz). Given that all it does it remove the need for a user to type one extra line of code (%load_ext ...
) I think it's not worth doing. The above solution of each plugin handling its own IPython extension feels clean and not having kedro automatically loading them also feels clean to me. If there's more call for adding this as a convenience in future then we could always add it, but honestly it feels very unlikely that many people would want it.
From the kedro side, I think the only possible objection would be that the flow for doing %run_viz
would now require you to do %load_ext kedro_viz
first. Personally I think this is totally fine. Maybe %load_ext kedro_viz
would even execute %run_viz
automatically (like %load_ext kedro
runs %reload_kedro
).
We should also check that this doesn't break everything. i.e. if someone upgraded to a version of kedro without this entrypoint but still had things registered to it, would everything crash horribly? What happens if you pip install kedro-viz
and the entrypoint isn't defined?
@Galileo-Galilei what do you think of this all? As above, if it's a plan you are happy with then I think we should just do it after checking that no one else cares in a user survey.
I should also clarify that the above local_ns
trick does work in terms of hot-reloading the variables, i.e. it's not stuck on the initial values of these variables populated when %reload_kedro
first executes.
print(local_ns)
inside the definition of %run_viz
%run_viz
you see the catalog
etc. variables populated as you'd expect%reload_kedro
altering the variables by manually setting catalog = 10
%run_viz
again shows the updated catalog = 10
valueThank you for the very detailed answer!
If we can agree a good plan with @Galileo-Galilei then I would even be happy to do this as a non-breaking release, after doing a user survey to double check that no one else is using it.
Completely happy with it, and not considering it as a breaking change:
kedro-mlflow
(they don't even use the one in kedro
), and no of our company internal plugins use it neither. load_ext kedro.extras...
calls load_context()
which setup the configuration and stores the config inside the context as an attribute, as discussed in details here, so whatever your decision is, I can already remove it from kedro-mlflow
This is basically the Possible Alternatives proposal from @Galileo-Galilei above
Glad to go with this option!
So long as %reload_kedro has run (which is called by %load_ext kedro) then these variables are exactly what's available inside local_ns. i.e. you can do local_ns["catalog"] inside your kedro-mlflow line magic. Probably you don't even need local_ns and could get these out of get_ipython() somewhere.
I figured this out muuuuuuuuch longer after I wrote the original post :sweat_smile:, but thank you for the indication!
However, introducing this hook feels like a big addition for what would be a small increase in convenience for a small and infrequently used feature
Completly agree on this. The feature seems too little used to be worth the engineering effort, especially since your "Requirement 1" would already be a very convenient setup.
We should also check that this doesn't break everything. i.e. if someone upgraded to a version of kedro without this entrypoint but still had things registered to it, would everything crash horribly? What happens if you pip install kedro-viz and the entrypoint isn't defined?
Indeed, it would be a pity to have a lot of users complaining because you removed something unused !
@Galileo-Galilei what do you think of this all? As above, if it's a plan you are happy with then I think we should just do it after checking that no one else cares in a user survey.
Let's go! :rocket:
Awesome, thanks for the confirmation @Galileo-Galilei. Looking at the code again actually there's no way this can break anything since Kedro actually doesn't expose particular entrypoint that are available for use; it just tries to load things up using kedro.framework.cli.utils.load_entry_point
. Hence if we don't look for the line_magic
entrypoint at all then it doesn't matter if there's things registered there. They'll just be ignored and not break anything.
The actions here are:
kedro.line_magic
entrypoint%load_ext kedro_viz
step to execute %run_viz
. Note that in the future might not even be an additional step here; we could consider making %load_ext kedro_viz
run a default version of %run_viz
(without any extra pipeline
etc. arguments). And we'd get the bonus of an %unload_ext
command to kill the kedro-viz instance, which we've always been missingload_ipython_extension
command and registers line magic %run_viz
(ticket to follow on kedro-viz repo). In the same way that we've aliased kedro.extras.extensions.ipython
to kedro
we should alias it to kedro_viz
%run_viz
to include extra %load_ext kedro_viz
stepkedro.line_magic
entrypoint from ENTRY_POINT_GROUPS
and any code that refers to it (probably just kedro.extras.extensions.ipython
+ tests)%unload_ext
and consider whether just doing %load_ext kedro_viz
should launch kedro-viz itself without the need for %run_viz
step. Depends a bit on what happens with the two possible %run_viz
commands. Need to think about how specifying arguments like pipeline
would work.Notes from technical design session on 31 August:
needs_local_scope
should be on kedro-viz side)%load_ext kedro_viz
is not as beginner-friendly as the existing workflow where %run_viz
is registered automatically%load_ext kedro_viz
itself loading kedro-viz e.g. how do you provide CLI arguments like pipeline
? _Note there is a flow here that's possible and consistent with %load_ext kedro
though, in which %load_ext kedro_viz
executes %run_viz
but you can also call %run_viz
yourself with arguments_%run_viz
line magic existing, but @SajidAlamQB thinks it's fine to have the extra %load_ext kedro_viz
step as wellipython
+catalog
/context
/etc. arguments) would remove the need for the %load_ext
step. @AntonyMilneQB (and @Galileo-Galilei from above discussion) think this is overkill for now and could always be added laterI am slightly in favor of loading %run_viz
automatically.
Existing workflows
kedro ipython/jupyter
-> %reload_kedro
(If you are in local)%load_ext kedro
-> %reload_kedro
(If you are in JupyterHub/Databricks)I would expect 1 is a more common case, so I think most people would only interact with %reload_kedro
, and therefore they should interact with %run_viz
equivalently. We could also add more line magic in the future (maybe a line magic to save a static output or something related to experiment tracking?), this makes me less prefer overloading the load_ext
too much.
Overall I agree the existing implementation is incorrect (arguably it's a bug, just that no one is using it), but I would go for a lower effort option which may just be fixing the local_scope
, fixing the line magic to accept the correct arguments (if we haven't already fixed it?)
@noklam I definitely see the argument for %load_ext kedro_viz.ipython
not running kedro viz
automatically. Note this idea is not so different from how the kedro IPython extension works though:
%load_ext kedro.ipython
automatically runs reload_kedro
(so your point 1 above isn't quite right: if you're in local usually you don't need to run %reload_kedro
at all)env
) then you run %reload_kedro
Compare to possible kedro-viz IPython workflow:
%load_ext kedro_viz.ipython
automatically runs kedro-viz
env
or pipeline
) then you run %run_viz
I don't feel strongly about %load_ext kedro_viz.ipython
automatically running kedro-viz though - it does also feel a little unusual to me also.
What I think is more important is the existence of the line_magic
entrypoint, and the more I think about this the more I still think we should remove it. Fixing this line_magic
entrypoint needs a bit more than just moving the needs_local_scope
I think:
register_line_magic
is not the right way to even register the function (should be ipython.register_magic_function
)load_ipython_extension
rather than reload_kedro
None of these are hard things to do but I think the effort is comparable to just removing it. Adding the on_ipython_loaded
hook would also be very easy and a much better solution if we do still want %run_viz
to be registered automatically (i.e. without needing to do %load_ext kedro_viz.ipython
).
One advantage of an explicit %load_ext kedro_viz.ipython
that I forgot to mention before is that it would enable %unload_ext kedro_viz.ipython
, which could be used to kill kedro-viz instances. This is something we don't currently have a way to do and would otherwise require a new line magic %kill_viz
or something. The load/unload
mechanism feels much nicer though.
Let me try and paint the picture in a different way: if this line_magic
entrypoint didn't already exist in kedro, would we be trying to add it? I think very unlikely. It would be a non-trivial increase in the API surface for a small convenience for one particular use case that is not common (one less line to type if you want to launch kedro-viz from Jupyter). Given that the effort to remove it is about the same as that required to fix it, and removing it would be de-facto non-breaking, I still don't really see any point in keeping it.
It's such a minor thing though that I'm also happy just to leave it as it is rather than removing or fixing it... 😀
This issue has changed quite a bit since it was originally written. Look at the final post here to see what we now need to do.
Original issue here for posterity.
Description
I am the author of several kedro plugins either open souce (kedro-mlflow) or at work. It turns out that most plugins either perform one (or several) of the following tasks:
Since Kedro is a data science framework, and since Jupyter is a standard for the exploration phase of a data science project, it seems natural to make Kedro compatible with Jupyter. For now, Kedro offers the following functionnality:
This works well for the core library, but it would be a more pleasant user experience to improve plugin compatibilities with notebooks with the following actions:
Context
It is a common pattern to "setup" your plugin, e.g. instantiate a connection to an external service (kedro-mlflow creates a connection to the mlflow server, I do have a sas plugin which creates a connection to the database...), especially in ther "before pipeline_run" hook.
If I create a custom
line_magic
(say%reload_kedro_mlflow
), it is discovered by kedro and it is accessible inside the notebook or the ipython session. However, I'd like to run it automatically when the notebook is opened via the Kedro CLI instead of forcing the user to make it manually.Possible Implementation
(Optional) Suggest an idea for implementing the addition or change.
IpythonHook()
with aon_ipython_start
@hook_spec
to enable plugins to setup their own configuration when activating the session.line_magic
are discovered (https://github.com/quantumblacklabs/kedro/blob/fc9d1c5d35981c51a651af467e79e93df065d354/kedro/extras/extensions/ipython.py#L104-L106), callhook_manager.on_ipython_start(path, metadata, session, context, catalog)
, so that plugins can setup some configuration automatically.Possible Alternatives
It is currently possible to bypass Kedro configuration by creating a
load_ipython_extension
function to execute code when opening the notebook (like here: https://github.com/quantumblacklabs/kedro/blob/fc9d1c5d35981c51a651af467e79e93df065d354/kedro/extras/extensions/ipython.py#L121). However, we cannot access the objects created byreload_kedro
(especially the session or the catalog if we want to modify /use them). We need to duplicate almost all the code to make it works, and it prevents combining several plugins.