kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
10.02k stars 906 forks source link

Validate datasets versions #4347

Open ElenaKhaustova opened 1 day ago

ElenaKhaustova commented 1 day ago

Description

Solves https://github.com/kedro-org/kedro/issues/4327

Development notes

Added _validate_versions function to ensure all datasets in a catalog adhere to a versioning scheme - we allow single load version per dataset in the catalog and one save version for all datasets in the catalog. The function automatically updates the provided load versions based on the versions specified for the individual datasets. It also ensures all versioned datasets in the catalog share the same save version. If a conflict arises, a VersionAlreadyExistsError is raised.

Validation is applied to both DataCatalog and KedroDataCatalog when a catalog is created or the dataset is added.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist