kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 894 forks source link

Add fallback to starters pull on kedro new #3900

Closed lrcouto closed 2 months ago

lrcouto commented 4 months ago

Description

During our current project creation and test setup, Kedro looks up to the kedro-starters repo and always uses the latest released version to get project templates. This can cause a problem during releases that depends on changes on the kedro-starters repo, as they won't be acknowledged by the current flow until after they are released.

This PR implements a fallback for when this situation happens. When the version of Kedro installed on your environment does not match the latest kedro-startersrelease, it will pull the main branch instead.

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

lrcouto commented 3 months ago

A couple notes:

lrcouto commented 3 months ago

Thank you for your reviews, @ElenaKhaustova and @DimedS ! From what I'm getting, the logic of the version/branch/tag selection for starters itself is a little confusing, so I'm gonna look into refactoring it a little bit.

As for the question regarding the environment variable, it came from this discussion. We were having problems with repeated requests to Github being rejected, which happened every time tests were ran on out CI. My idea is that setting it on the environment once and then checking it would avoid this large number of requests, and it would not need to be re-declared every time the file is loaded.

lrcouto commented 3 months ago

If current_kedro_version is not in the kedro_starters_tags_list, we should not use checkout in the starters repo (since that tag won't exist). Instead, we should use the default starters repo. Otherwise, if current_kedro_version is in the list, we should set kedro_starters_tag to current_kedro_version.

Regarding this point, would we have to get all of the existing Kedro versions from Git? Or does that exist already somewhere? I'm asking because it'd be another request that we'd have to make.

DimedS commented 3 months ago

If current_kedro_version is not in the kedro_starters_tags_list, we should not use checkout in the starters repo (since that tag won't exist). Instead, we should use the default starters repo. Otherwise, if current_kedro_version is in the list, we should set kedro_starters_tag to current_kedro_version.

Regarding this point, would we have to get all of the existing Kedro versions from Git? Or does that exist already somewhere? I'm asking because it'd be another request that we'd have to make.

I believe we can use the latest_kedro_starters_tag that you already received and compare it with the current_kedro_version. If the current_kedro_version is greater than the latest_kedro_starters_tag (indicating that Kedro has been released with a new version, but the starters have not yet been updated), then we should use the default starters repository (i.e., the latest starters from the main branch). Otherwise, we should maintain the current logic and use the same version of starters that matches the Kedro version.

lrcouto commented 2 months ago

Pushed some changes based on the conversation earlier today with @ElenaKhaustova and @DimedS.