kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

Fix catalog shallow copy changing class type #3950

Closed merelcht closed 1 week ago

merelcht commented 2 weeks ago

Description

Fixes #3857. Catalog shallow_copy() should use specified class type and not cast to DataCatalog.

Development notes

Initially I thought I needed to use settings.DATA_CATALOG_CLASS inside the shallow_copy() method, but that resulted in import contract errors enforced in https://github.com/kedro-org/kedro/blob/main/pyproject.toml#L175. Specifically:

----------------
Broken contracts
----------------

CLI > Context > Library, Runner > Extras > IO & Pipeline
--------------------------------------------------------

kedro.io is not allowed to import kedro.framework.project:

- kedro.io.data_catalog -> kedro.framework.project (l.17)

Pipeline and IO are independent
-------------------------------

kedro.io is not allowed to import kedro.pipeline:

- kedro.io.data_catalog -> kedro.framework.project (l.17)
  kedro.framework.project -> kedro.pipeline (l.23)
                             & kedro.pipeline.pipeline (l.23)

However, thinking about it a bit more I decided it's fine not to use settings here, because this is not about instantiating the datacatalog but copying it. It's already instantiated in the context: https://github.com/kedro-org/kedro/blob/main/kedro/framework/context/context.py#L231

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist