kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

[DataCatalog]: Add a data schema evaluation mechanism #3943

Open ElenaKhaustova opened 3 weeks ago

ElenaKhaustova commented 3 weeks ago

Description

Users express the need for data schema evaluation to enable "fail-fast" capabilities during data loading and consistency checks before execution. They highlight the potential benefits of schema evaluation in integrating with other services, validating pipelines before execution, and running API checks.

We propose to explore the feasibility and necessity of implementing a data schema evaluation mechanism.

Relates to https://github.com/kedro-org/kedro/issues/3613

Context

Responses obtained during user research interview:

merelcht commented 2 weeks ago

Is this at all related to Move kedro catalog validation schema to kedro-datasets?

yury-fedotov commented 1 day ago

Shouldn't that be leveraging kedro-pandera?