kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.82k stars 896 forks source link

Installable Kedro catalog (Mini-Kedro) #2741

Open WaylonWalker opened 3 years ago

WaylonWalker commented 3 years ago

Would it make sense to make mini-kedro installable? My use case for projects like that are users doing EDA and just want easy access to the data with no fuss.

If it is something that makes sense I propose adding a setup.py to make it installable, and a single module that sets the catalog up for them, then they can access the project's data as follows.

import my_mini_kedro as mmk
mmk.catalog.load('my_dataset')

Alternatively

This could also be a separate starter that is a kedro-catalog starter.

JavierHernandezMontes commented 3 years ago

@WaylonWalker do you know a way to load a certain dataset of the catalog with the current version of kedro 0.17.0? as you propose to do.

WaylonWalker commented 3 years ago

I just added a very basic setup.py and an __init__.py with a catalog loader to make it work.

WaylonWalker commented 3 years ago

@JavierHernandezMontes is your data local or remote? For local data that needs packaged in it might be a bit trickier.

noklam commented 2 years ago

@WaylonWalker Is this still relevant with the IPython extension?

astrojuanlu commented 1 year ago

At the moment, using Kedro without using the project template is entirely possible: pip install kedro and then instantiating the catalog directly:

from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog

conf_loader = OmegaConfigLoader("conf")
conf_catalog = conf_loader.get("catalog")
catalog = DataCatalog.from_config(conf_catalog)

catalog.load(...)

On IPython & Jupyter, this is one line:

%load_ext kedro.ipython

catalog.load(...)

I agree it would be nice to make the boilerplate above go away, so I'm moving this issue and renaming it for our consideration. It will have more visibility on the framework repo.

astrojuanlu commented 1 year ago

We are exploring alternative approaches, see gh-2819

astrojuanlu commented 11 months ago

And #2967

astrojuanlu commented 11 months ago

More evidence of users using the Kedro catalog without the pipelines: https://github.com/kedro-org/kedro/issues/2898#issuecomment-1736000094

astrojuanlu commented 6 months ago

Another user asking for exactly this https://linen-slack.kedro.org/t/16593946/is-there-a-way-of-installing-only-the-data-catalog-part-of-k#b6d532c4-2d7f-4add-b0ee-b0bfcffbdd5e