kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
662 stars 110 forks source link

Enable %run_viz magic line to use the arguments that Kedro-Viz supports on the command line #1733

Closed SajidAlamQB closed 6 months ago

SajidAlamQB commented 7 months ago

Description

Related to: https://github.com/kedro-org/kedro-viz/issues/1699

make %run_viz magic line support to passing any arguments of kedro viz run in notebooks.

Follow up PRs should update the documentation on this: https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_and_notebooks.html#run-viz-line-magic

Development notes

A lot of the logic is borrowed from cli.py I had initially tried to use the click context for line magic but it was becoming quite complicated. I think the easiest approach is to just manually do the logic for these arguments.

Example flow:

image image

QA notes

I have't added a comprehensive test suite for most of this new logic as it already tested in the cli.py and didn't want to repeat them again. Let me know if you all think otherwise.

Checklist

ravi-kumar-pilla commented 6 months ago

Hi @SajidAlamQB ,

Approach looks good to me. I tried testing on jupyter, the default pipeline works well i.e., when --pipeline arg is not passed. But if I pass the --pipeline arg, I see the below behavior.

  1. The link to open kedro viz appears instantly
  2. When clicked, loads a 'site can't be reached' page.
  3. If reloaded, works fine.

It might not be an issue for this ticket rather a bug if this happens via cli as well (need to check)

line_magic_args_issue

SajidAlamQB commented 6 months ago

Approach looks good to me. I tried testing on jupyter, the default pipeline works well i.e., when --pipeline arg is not passed. But if I pass the --pipeline arg, I see the below behavior.

  1. The link to open kedro viz appears instantly
  2. When clicked, loads a 'site can't be reached' page.

Thanks @ravi-kumar-pilla I tried the same thing and did not see the site can't be reached page it went directly to viz. Could it be a browser thing, I did mine on Safari?

ravi-kumar-pilla commented 6 months ago

Approach looks good to me. I tried testing on jupyter, the default pipeline works well i.e., when --pipeline arg is not passed. But if I pass the --pipeline arg, I see the below behavior.

  1. The link to open kedro viz appears instantly
  2. When clicked, loads a 'site can't be reached' page.

Thanks @ravi-kumar-pilla I tried the same thing and did not see the site can't be reached page it went directly to viz. Could it be a browser thing, I did mine on Safari?

Not sure. I will try on Safari, in the mean time can you try on Chrome as well ? Thank you

SajidAlamQB commented 6 months ago

Hey @ravi-kumar-pilla and @ankatiyar, so I've been trying to use the @magic_arguments, emulating how it was done in https://github.com/kedro-org/kedro/blob/670eb0916eea3d5b7758c991127912e56b0e6b65/kedro/ipython/__init__.py#L59, to add shorthands for some of the commands, and while this worked great, I have been running into a few issues.

Problem 1:

When doing %run_viz --pipeline="Reporting stage" we get an unrecognised argument error:

image

This seems to stem from the way spaces are handled between the equal sign and the arguments. It appears that the parser interprets everything after the space as a new, unknwon argument rather than being a whole string. This can be easily avoided though if we skip the equals with something like %run_viz --pipeline "Reporting stage" but this comes to the second problem.

Problem 2:

The following problem comes when we do %run_viz --pipeline "Reporting stage". It correctly identifies the argument value as being "Reporting stage", but it also includes the quotes in the string itself, so its parsed as "'Reporting stage'". Which will result in KeyError.

image (Added print statement to show output after parsing)

We can get around this problem by manually stripping of the quotes before we can use the value, but it is an extra step and might not be the most maintainable way to do this.

Both the problems seem to come from how parse_argstring handles spaces and quotes with @magic_arguments. There are a few options we could do such as manually stripping the quotes, and letting users know that giving arguments with spaces we shouldn't use the =. We could also forgo using @magic_arguments completely and parse the lines ourselves as we are doing now in this PR. I am also open to any other suggestions. Would love to get your thoughts on this.

ankatiyar commented 6 months ago

@SajidAlamQB Does kedro viz run --pipeline="Reporting stage" work? I believe we only allow valid python package names for pipelines? So the actual pipeline names wouldn't contain spaces ideally.

ravi-kumar-pilla commented 6 months ago

Does kedro viz run --pipeline="Reporting stage" work? I believe we only allow valid python package names for pipelines? So the actual pipeline names wouldn't contain spaces ideally.

Hi @SajidAlamQB , I feel we should stick to providing options with = like kedro viz run --pipeline="Reporting stage". If @magic_arguments has an issue parsing this, we can have our own parsing as we have in this PR. If our users ask for shorthand in future, we will try to refactor this.

@ankatiyar we do have spaces for pipeline names on viz. I am not aware that this is not allowed.

SajidAlamQB commented 6 months ago

Does kedro viz run --pipeline="Reporting stage" work?

I tired kedro viz run --pipeline="Reporting stage" directly in the terminal and it works there.

astrojuanlu commented 6 months ago

I think kedro viz run --pipeline should be consistent with how kedro run --pipeline behaves. This behavior was surprising also for me.

Maybe this is a separate issue though.

SajidAlamQB commented 6 months ago

Given what we have discussed I tend to agree with @ravi-kumar-pilla that due to the issues of @magic_arguemnts not parsing properly sticking with the manual parsing so only having longer form of arguments (--pipeline="Reporting stage") is acceptable for now. We can see if there's strong demand for shorthand arguments and consider them for future enhancement.

SajidAlamQB commented 6 months ago

It might not be an issue for this ticket rather a bug if this happens via cli as well (need to check)

@ravi-kumar-pilla regarding the issue with the --pipeline argument leading to a "site can't be reached" page, it seems like a browser related issue that resolves with a refresh. Its not specifically caused by --pipeline I was able to recreate it with just %run_viz --host=127.0.0.1 I think it might be linked to how quickly we're trying to access Kedro-Viz after it launches. It looks like a minor browser caching or connection reuse issue, not directly tied to the scope of this issue a simple refresh or stopping the cell before running solves the issue.

astrojuanlu commented 6 months ago

Opened an issue about the pipeline naming. Other than that, it was not clear to me if this PR is still desired and waiting for review.

ravi-kumar-pilla commented 6 months ago

Opened an issue about the pipeline naming. Other than that, it was not clear to me if this PR is still desired and waiting for review.

Hi @astrojuanlu , Thank you for creating a new ticket. This PR is almost done but we still have the issue of site cannot be reached (https://github.com/kedro-org/kedro-viz/pull/1733#issuecomment-1930866694) sometimes. I will try to fix it today.