kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 877 forks source link

Consider removal of project cli.py #2384

Open antonymilne opened 1 year ago

antonymilne commented 1 year ago

Just some rough notes for now; details to be added when I get a chance...

My ultimate hope here is that we can remove the project cli.py and implement something like https://github.com/kedro-org/kedro/issues/1041, while making #1423 much easier.

datajoely commented 1 year ago

If you want to extend the run command with extra click inputs I think this is the only way. What I would like is if we go down the kedro init route we could generate one if people need it.

I think from a user point of view copy and pasting something that works is much less intimidating that creating a plugin. By that logic, I'd also be open to kedro init plugin as a different way of people doing this.

antonymilne commented 1 year ago

Currently it's the only way without writing a plugin, but how common actually is it to add extra click arguments to run? I consider it kind of niche/advanced functionality. And, out of those cases, how many times do you want to make the change to just one project using cli.py rather than several projects using a plugin? So personally I think it's probably fine to require a plugin to do this. But this issue exists to provoke some varying opinions and discussion and user research on the topic 👍 I think @noklam brought this up originally actually so am very interested in his thoughts here!

Totally agree that whatever we do we should make it easier to actually add those extra click arguments though e.g. through sort of kedro init plugin. The current model of copying and pasting from docs is not great in both the case of making a plugin and how to override cli.py. There's many gotchas, it's hard to keep up to date, and it's very easy to get tripped by both routes at the moment.

datajoely commented 1 year ago

niche/advanced functionality

Until we natively provide a way for people to inject / override catalog entries, we're going to see people doing the same thing

noklam commented 1 year ago

Adding a data point of use of cli.py (feel free to edit)

I think this is a pretty advanced use case, but I think you will find it interesting @AntonyMilneQB

noklam commented 1 year ago

AFIAK this topic comes out of #1423 as there is a subtle difference between running kedro as a package VS running kedro via KedroSession, the CLI entry point is one of the differences.

I am closer to @datajoely on this, I don't think we offer a better way to achieve this.

How would removing the cli.py make #1423 easier?

datajoely commented 1 year ago

Another vote for kedro init