kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 895 forks source link

Decouple starters from framework in tool selection flow #3791

Open merelcht opened 5 months ago

merelcht commented 5 months ago

Description

With the addition of the tools flow some coupling between framework and the starters was added in the post_gen_project scripts: see https://github.com/kedro-org/kedro-starters/blob/main/spaceflights-pyspark/hooks/post_gen_project.py

Unit tests for the tools selection have become hard to maintain see e.g. #3594, because framework needs to pull the starters. If we can somehow decouple the tools selection process better the tests might become simpler too.

Context

See https://github.com/kedro-org/kedro/issues/3594#issuecomment-1927510672

Possible Implementation

The checks here: https://github.com/kedro-org/kedro-starters/blob/main/spaceflights-pyspark/hooks/post_gen_project.py#L30 shouldn't be needed because at this point the tool selection should already have been checked and verified on the Kedro side. This might mean we can move the main function to Kedro.