kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 879 forks source link

Consider removing micropackaging #3750

Closed astrojuanlu closed 3 months ago

astrojuanlu commented 4 months ago

There has been a long time milestone to assess the micropackaging workflow https://github.com/kedro-org/kedro/milestone/21

It is somewhat inconsistent with kedro package https://github.com/kedro-org/kedro/issues/1536, relies on binary distributions aka wheels, it's difficult to test, and most importantly not many people use it.

Are you a user of micropackaging? If so, please drop a comment below with your thoughts 👇🏽

SELECT
  COUNT(*) AS count,
  MAIN_COMMAND,
FROM HEAP_KEDRO_APP.HEAP.ANY_COMMAND_RUN
WHERE TIME > '2023-01-01'
GROUP BY MAIN_COMMAND
ORDER BY count DESC
COUNT MAIN_COMMAND
2142002
2007807 run
275495 *****
84696 viz
13073 jupyter
8127 ipython
4968 test
4915 pipeline
4349 docker
2959 templar
2253 mlflow
2026 registry
1979 package
1882 lint
1745 vertexai
1186 info
1151 azureml
1114 benchmark
909 kedro
839 mlrun
672 catalog
623 --version
595 sagemaker
590 airflow
567 -h
391 new
378 build-reqs
319 micropkg
315 fast-api
281 build-docs
237 boot
213 create
188 --help
131 kubeflow
102 azure
101 run-azure
69 --pipeline

The proposal is to:

  1. Leave this issue open for another release cycle and request feedback on our usual channels.
  2. If we don't see enough evidence that we should keep this functionality, deprecate it for 0.19. Every time a user uses kedro micropkg a DeprecationWarning should be shown (or equivalent, see discussion about FutureWarning and DeprecationWarning in various places, for example https://github.com/kedro-org/kedro/issues/2744#issue-1779342165)
  3. Move it to a plugin and immediately archive it. This way folks have a way to transition, we signal that we don't intend to develop it anymore in its current form, and the community can fork if someone wants to.

The alternative to (3) would be to not do anything, just deprecate and remove.

Thoughts?

datajoely commented 4 months ago

I really like option 3

merelcht commented 4 months ago

I would go for 1 + 2, 3 seems like a lot of unnecessary effort building and releasing a plugin for something we think is not used.

astrojuanlu commented 3 months ago

We got no indication from users that they want us to keep this. Therefore I consider we don't have enough evidence that maintaining this feature is worth the effort.

I will open follow-up issues to deprecate, and eventually remove, this functionality.