kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.48k stars 874 forks source link

Deprecate micropackaging #3854

Closed astrojuanlu closed 1 month ago

astrojuanlu commented 2 months ago

After some cool-off period in #3750, we saw no indication that users have a desire of keeping our micropackaging functionality, so we decided to eventually remove it.

The first step is to deprecate it for the present 0.19.x cycle.

Every time a user uses kedro micropkg a DeprecationWarning should be shown (or equivalent, see discussion about FutureWarning and DeprecationWarning in various places, for example https://github.com/kedro-org/kedro/issues/2744#issue-1779342165)

Since kedro.framework.cli.micropkg is public API too, this warning should also be shown when importing anything from that subpackage.

The warning should communicate to users that this functionality is no longer maintained and will be deleted in Kedro 0.20, and point them to https://github.com/kedro-org/kedro/issues/3750 in case they want to voice a dissenting opinion.

yury-fedotov commented 2 months ago

+, I think it's a nice initiative.

arek544 commented 1 month ago

what's an alternative to micropackaging? Should I switch to packaging instead?

astrojuanlu commented 1 month ago

@arek544 Could you detail your use case?

arek544 commented 1 month ago

I have multiple modular pipelines that l like to share between projects, so I thought micropackaging would be the best solution so far

merelcht commented 1 month ago

Hi @arek544, thanks for commenting. Micropackaging does suit your use case well and that's also the use case we built the feature for. However, adoption has been very low and most users have found alternative solutions. Are you on our Slack (http://slack.kedro.org/) by any chance? It would be good to explore a different solution for you for when the time comes that micropackaging is removed from Kedro.

merelcht commented 1 month ago

@astrojuanlu you wrote in the description:

Since kedro.framework.cli.micropkg is public API too, this warning should also be shown when importing anything from that subpackage.

The only public method aside from the CLI commands in kedro.framework.cli.micropkg is safe_extract: https://github.com/kedro-org/kedro/blob/main/kedro/framework/cli/micropkg.py#L393, does that really warrant an import deprecation warning? I'd be very surprised if anyone is using that since it's a pretty generic utility method.

astrojuanlu commented 1 month ago

Good to see most of that module is private API, wrote my comment without looking. I also doubt that anybody is using that specific function, but strictly speaking we should, right?

daniel-ressi commented 1 month ago

that's a shame, we started adopting it and with the changes in version 18 the use got smoother. Our use case is pulling preconfigured pipelines into new projects as a starting point for "similar" projects.

merelcht commented 1 month ago

that's a shame, we started adopting it and with the changes in version 18 the use got smoother. Our use case is pulling preconfigured pipelines into new projects as a starting point for "similar" projects.

Hi @daniel-ressi, thanks for your comment. For the use case you're describing we actually have Kedro Starters: https://docs.kedro.org/en/stable/starters/starters.html, this is a way to create project templates to standardise projects in a way that fits your organisation's setup.

daniel-ressi commented 1 month ago

that's a shame, we started adopting it and with the changes in version 18 the use got smoother. Our use case is pulling preconfigured pipelines into new projects as a starting point for "similar" projects.

Hi @daniel-ressi, thanks for your comment. For the use case you're describing we actually have Kedro Starters: https://docs.kedro.org/en/stable/starters/starters.html, this is a way to create project templates to standardise projects in a way that fits your organisation's setup.

thank you for your response. There were some benefits to micro-packaging for us (especially just pulling individual pipelines or datasets into a project that we already create with our own template which also includes backend/frontend/infrastructure components), but I do agree that kedro starters definetly are an alternative