kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

[DataCatalog]: Iterate through datasets objects in the catalog #3916

Open ElenaKhaustova opened 1 month ago

ElenaKhaustova commented 1 month ago

Description

Implement iterable support for the catalog.datasets, allowing users to iterate through datasets objects directly.

We propose to implement iterable support for the catalog.datasets, allowing users to iterate through datasets objects directly.

Context

In the current implementation one can only iterate through datasets' names obtained from catalog.list() which forces using private _get_dataset() method to get the dataset by name.

Image

"I want to get a catalog from a Kedro project and then I want to iterate through the datasets or even just fetch one dataset by the its name."

See Miro for user context: https://miro.com/app/board/uXjVN2JuRF0=/?moveToWidget=3458764593622844127&cot=14 See dovetail for user interview: https://mckinsey.dovetail.com/data/2UOzkqe9cGAVh7kpHQrNqX#:v:h=uwQEu12hu5NpE3G1KyA9R&s=1

astrojuanlu commented 3 weeks ago

There should be one-- and preferably only one --obvious way to do it.