kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

Implement version fallback for starters & framework #3879

Open merelcht opened 1 month ago

merelcht commented 1 month ago

Description

In our current test setup starters & framework are linked and look to the "latest" version. At release time it leads to a situation where it's looking for the unreleased version and tests fail because of it.

Implement a way to fall back to the latest released version OR specify a specific version.

Context

"There's always cookiecutter" - @lrcouto

Currently kedro-starters points to main kedro branch: https://github.com/kedro-org/kedro-starters/blob/db79aec64c4a0f062321bd8c74ee782750bba527/test_requirements.txt#L5 At the same time on kedro side we point to the main kedro-starters branch and synchronise it with kedro version used: https://github.com/kedro-org/kedro/blob/27f5405cefd6701ffac4c6243030486fb7d3c942/kedro/framework/cli/starters.py#L778 https://github.com/kedro-org/kedro/blob/27f5405cefd6701ffac4c6243030486fb7d3c942/tests/framework/cli/test_starters.py#L56

As a result, when we need to release both kedro-starters and kedro, we cannot specify a custom kedro-starters version on the kedro side. So we have to make a release with failing tests and only after making sure that tests are passing at the CI.

Possible Implementation

We already have a checkout argument that allows pointing to a specific tag or branch, which is used as an optional argument for the kedro new command. We can add a constant to specify the checkout argument as we do for starters repo: https://github.com/kedro-org/kedro/blob/27f5405cefd6701ffac4c6243030486fb7d3c942/kedro/framework/cli/starters.py#L98 Then, in case the checkout argument was provided, it will be used instead of constant. For the rest of the cases, including testing, we will use the constant and we only synchronise kedro-starters branch with kedro version when the checkout argument is not provided and the constant is set to None.

In that case to release both kedro-starters and kedro we would need to:

  1. Point to target kedro branch A in kedro-starters
  2. Set up checkout constant in branch A for kedro, pointing to the corresponding branch in kedro-starters
  3. Make sure all tests are passed for both repos
  4. Point back to main branches for both repos
  5. Release both

Possible Alternatives