kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 894 forks source link

Lazy load commands from plugins #3901

Closed ankatiyar closed 3 months ago

ankatiyar commented 4 months ago

Description

Partly resolve #1476

As discussed in #1476, the initialisation of KedroCLI takes up a significant chunk of time. This is especially evident if you have a lot of kedro plugins installed in the environment as KedroCLI which is a CommandCollection, loads up all the commands from the plugins before the command is processed. This PR is to add a lazy way to load the commands from the plugins.

Development notes

Update to custom CommandCollection:

Loading of plugins:

I tried to implement a sort of "smart" loading where if the command arg, i.e. airflow or mlflow or viz partially matches the entry point names in the lazy_group dict keys, load the plugin, check if the command now exists and exit. Note: The entry point names are decided by the plugins, eg. kedro-airflow's ep name is airflow which fully matches the command name too but for kedro-viz and kedro-mlflow the entry point names don't match the commands. eg. Command: viz Entry point name: kedro-viz

Otherwise, load all plugins one by one, exit if the command exists after loading plugins.

TODO

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

astrojuanlu commented 4 months ago

To note, as @ankatiyar noted internally, "bad" lookups will take a lot of time: for example, kedro infoo (notice typo) took 17 seconds in my computer and tried to import kedro-viz, kedro-mlflow, and more.

Going forward, is there a smarter way we can declare new CLI commands, so that we can look them up in entry points without actually importing anything? (Feel free to open a new issue about this)

ankatiyar commented 3 months ago

The code coverage is not complete but opening this up for review anyway to gather feedback on the implementation.

ankatiyar commented 3 months ago

Fair point @merelcht! 🤔 I was also trying to implement a way where --help does trigger the loading of the plugins. About overwriting of core Kedro commands eg. micropkg, that might also be possible with lazy loading. Let me explore these functionalities and update the PR (if it works!)!

ankatiyar commented 3 months ago

Unfortunately, I don't think the overriding behaviour of CLI is possible with lazy loading of plugins - i.e. if a plugin has overriding CLI commands for Kedro core commands such as kedro catalog list etc since the premise of the proposal is to only load plugins if a command does not exist in KedroCLI. I think it's possible to trigger the loading of plugins and display plugin commands with kedro --help but I would like to get the team's opinions on if we should be going ahead with this at all cc @astrojuanlu @noklam @merelcht

astrojuanlu commented 3 months ago

The python CLI has --help and --help-all. Maybe for the next breaking release we could make --help display only the "core" commands, and --help-all display everything.

For retaining the current behavior while attaining much faster load times for all the rest of commands, I'd be 👍🏼 on force-triggering loading of all plugins in kedro --help

ankatiyar commented 3 months ago

Moving this back to the drafts, will close it if no other comments come in. I'll focus on getting https://github.com/kedro-org/kedro-viz/pull/1920 and https://github.com/kedro-org/kedro/pull/3883 ready for review first!

ankatiyar commented 3 months ago

Closing this in favour of the other solution!