kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.91k stars 900 forks source link

Plan to improve ci/cd + automated release setup of `kedro-plugins` #2333

Closed merelcht closed 9 months ago

merelcht commented 2 years ago

Description

Following up on https://github.com/kedro-org/kedro-plugins/pull/4

Currently kedro-plugins has one top level Makefile, each plugin has it's own .pre-commit-config.yaml and pyproject.toml. Look at whether it would be better for these configuration files to be unified or for each plugin to have it's own files.

@AntonyMilneQB Raised a good question about whether the pre-commit hooks actually still work without the file being at the root of the repo.

For this issue come up with a plan to improve the setup:

deepyaman commented 1 year ago

@AntonyMilneQB Raised a good question about whether the pre-commit hooks actually still work without the file being at the root of the repo.

Having .pre-commit-config.yaml at the root of the repo is also recognizable by other tools, such as pre-commit.ci (which we could consider using).

There's no great answer for this, but I've seen https://github.com/pre-commit/pre-commit/issues/466#issuecomment-274282684. Keep in mind that the pre-commit author also says that it's not really designed for monorepos (although it can be made to work).

@noklam raised a good point:

The main drawback is if you are working on the smaller repositories like kedro-airflow , you still have to run the linting for the full datasets repo which takes more time

Maybe it makes sense to just use something like tox to manage the way you run tests in something like this?

jmholzer commented 1 year ago

In order to install the Kedro dependency correctly, we need to add pip install -r requirements.txt under install-test-requirements in the case that $plugin is equal to kedro-datasets in the kedro-plugins Makefile.

deepyaman commented 1 year ago

I would be open to putting together a PoC with tox, if you (others) are interested @jmholzer.

deepyaman commented 1 year ago

Preface

The below breakdown of challenges and solutions aims to:

  1. provide tactical solutions to existing challenges (primary motivation)
  2. use kedro-plugins as a lightweight testbed for tooling for better practices/other improvements, rather than on the more complex main kedro repo (secondary motivation)

Challenges

Proposal

noklam commented 1 year ago

Agree with the challenges but I would order them in reverse order.

I would add one more challenge:

merelcht commented 1 year ago

As discussed in technical design on 15/2:

  1. This issue becomes a milestone: https://github.com/kedro-org/kedro/milestone/30
  2. All P0 points are converted into issues
  3. When all P0 issues have been completed, we'll revisited what more needs to be done. Most likely in technical design sessions.
deepyaman commented 1 year ago

One additional note to keep in mind that I just remembered; we need to address the behavior when multiple plugins are modified (or just make sure this doesn't happen in the future approach):

image
astrojuanlu commented 1 year ago

Potentially relevant for the automated release https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/

astrojuanlu commented 12 months ago

I think most of https://github.com/kedro-org/kedro/issues/2333#issuecomment-1434492921 has already been done, except for https://github.com/kedro-org/kedro-plugins/issues/512

merelcht commented 9 months ago

Closing this because the CI/CD of kedro-plugins has been hugely improved already. I created a new milestone for testing the plugins individually: https://github.com/kedro-org/kedro/milestone/60