[DataCatalog]: Hard to to manage large catalogs

kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

Apache License 2.0

9.47k stars 875 forks source link

Description

Users find it difficult to manage large catalogs as the current separate from code configuration structure requires excessive navigation back and forth, YAML-based data catalog is cumbersome to manage and navigate.

We propose to:

Explore the opportunity to offer an alternative to YAML-based catalogs that can be integrated with the current configuration approach.
Explore how existing VS Code plugin simplifies working with large catalogs and extend it with features for easy navigation.

Context

"I believe there's room for innovation in how the data catalog is structured in relation to the code. Currently, the configuration is organized differently and separately from the code, which requires a lot of navigation back and forth. Maybe an alternative where the catalog lives closer to where it's used could potentially reduce this overhead and improve productivity."

kedro-org / kedro

[DataCatalog]: Hard to to manage large catalogs #3938

Description

Context