kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 897 forks source link

`kedro new --starter` default to use `main`/`master` #1696

Open noklam opened 2 years ago

noklam commented 2 years ago

I've actually always thought (and I think @Galileo-Galilei agrees) that we should change this behaviour since in reality no one maintains a starters repo which they keep up to date with every single version of the kedro template.

  • by default don't try and checkout the kedro version unless it's an official starter (cookiecutter will by default look for the right main/master branch if you don't specify checkout)
  • and/or enable people to specify checkout in their KedroStarterSpec

But that's a discussion for another time.

_Originally posted by @AntonyMilneQB in https://github.com/kedro-org/kedro/pull/1695#discussion_r916579114_

Background

Currently kedro new assumes the starter uses the same version as kedro, i.e. With kedro==0.18.0 it will pull the 0.18.0 kedro-starters template. This makes sense when we only have kedro-starter. We introduced the starters template plugin in 0.18.2 and this assumption no longer holds.

Description

Example Kedro version = 0.18.2

antonymilne commented 2 years ago

Looking back through the discussion in https://github.com/kedro-org/kedro/pull/1265, I can confirm that @Galileo-Galilei independently had the same feeling as me on this:

It makes sense for kedro itself to have a default to the starter matching the kedro version, but i don't think it really does for user defined starters.

He also thought checkout should not become part of KedroStarterSpec, which is fine by me. So the behaviour would be exactly as you describe above 👍

Default checkout version should be None, cookiecutters should take cares of pulling the latest version. (Do check if this is the case)

I can confirm from looking through cookiecutter that this is correct. If checkout is None then it clones the repo and doesn't checkout any particular branch, which means it will automatically be on HEAD. So we don't need to do anything special here - it will work by itself.

The only catch with changing this is that it's a breaking change, so we will need to put in some warning that the behaviour will change when the user calls a starter that's not a built-in one.

astrojuanlu commented 10 months ago

It's unclear to me what problem does this solve, would you folks please clarify? Also, wondering if any of the new project tools affect this.

antonymilne commented 10 months ago

From memory, the main problem goes like this:

The current solution (other than starter maintainer making a new tag every single kedro release) would be for the user to explicitly specify --checkout main or similar. The problem is that this should really be the default behaviour rather than needing to be explicitly specified.

The only possible problem that changing this causes is that it would mean the default --checkout value would be different in official kedro starters (where it would use kedro version) compared to all others (where it would take None and so default to e.g. main branch as above). This is ok by me, although maybe it's worth considering if the same change should actually be made for official kedro starters also.