Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
Moved warm-up from the session to the AbstractRunner before we call _run() and made this logic common for all runners. This simplified the logic as now it's common for all runners and all the patterns are resolved before the pipeline run.
We added a unit test with dataset patterns to check that ThreadRunner is not failing with a Dataset 'name' has already been registered error. In the test, we check that the dataset was registered at the warm-up, and we successfully passed to loading it, though we do not do the actual loading.
If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Description
Solves https://github.com/kedro-org/kedro/issues/4250
Relates to https://github.com/kedro-org/kedro/issues/3935 - step 5 in the proposed solution
Development notes
Moved warm-up from the session to the
AbstractRunner
before we call_run()
and made this logic common for all runners. This simplified the logic as now it's common for all runners and all the patterns are resolved before the pipeline run.We added a unit test with dataset patterns to check that
ThreadRunner
is not failing with aDataset 'name' has already been registered
error. In the test, we check that the dataset was registered at the warm-up, and we successfully passed to loading it, though we do not do the actual loading.We tried to make the full test with actual data loading — https://github.com/kedro-org/kedro/commit/208a24b7d093c85a0e8e00b55747e5e9231c5f41. But we faced a problem where the first thread was trying to load data before it was created — https://github.com/kedro-org/kedro/actions/runs/11562442829/job/32183629244. This happened mostly at the CI for the latest Python versions only and was hard to reproduce locally. We tried some checks if the file exists before calling
run()
but it didn't help. So, we changed the test to the current one to exclude data creation.Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file