kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

[DataCatalog]: Enhance `_FrozenDatasets` public API #3926

Open ElenaKhaustova opened 3 weeks ago

ElenaKhaustova commented 3 weeks ago

Description

Users face challenges with understanding and effectively utilizing the _FrozenDatasets public API due to unclear documentation and limitations. They struggle to get dataset by name, iterate through datasets and get metadata. They express uncertainty about the advantages of using _FrozenDatasets, and find it unintuitive to work with due to its underscore prefix and limited functionality compared to the private API.

We propose:

  1. Enhance the FrozenDatasets public API to provide more comprehensive functionality, including the ability to iterate over the datasets (https://github.com/kedro-org/kedro/issues/3916), access some metadata (type of dataset, type of file, filepath), and utilize methods like get_by_name() for flexible dataset retrieval.
  2. Increase users' awareness of the _FrozenDatasets API through tutorials and documentation updates. Highlight the public API's capabilities and provide guidance on how to use it effectively for dataset management and retrieval.
  3. Consider allowing DataCatalog modifications and getting rid of _FrozenDatasets - this is a broader question related to another issue that will be linked later.

Context

Some quotes from the user feedback:

Screenshot 2024-06-03 at 15 52 14

Screenshot 2024-06-04 at 23 43 09