kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

[DataCatalog]: Provide public methods to modify catalog #3930

Open ElenaKhaustova opened 3 weeks ago

ElenaKhaustova commented 3 weeks ago

Description

Plugin developers and advanced users face limitations due to the absence of public methods for modifying the catalog datasets, and injecting dynamic behaviour or configuration parameters on the fly during pipeline execution. Although these limitations are made intentionally by not providing corresponding public APIs users bypass them by using private APIs.

We propose to:

  1. Rethink the concept of keeping DataCatalog immutable.
  2. Explore the feasibility of providing public API for modifying the catalog datasets and configuration parameters, enabling users to adapt the pipeline's behaviour in response to changing runtime requirements or environmental conditions.

Relates to https://github.com/kedro-org/kedro/issues/2728

Context

https://github.com/Galileo-Galilei/kedro-mlflow/blob/64b8e94e1dafa02d979e7753dab9b9dfd4d7341c/kedro_mlflow/framework/hooks/mlflow_hook.py#L145

Screenshot 2024-06-05 at 17 58 19

https://github.com/getindata/kedro-azureml/blob/d5c2011c7ed7fdc03235bf2bd6701f1901d1139c/kedro_azureml/hooks.py#L20

Screenshot 2024-06-05 at 17 37 57

astrojuanlu commented 3 weeks ago

Adding a few more examples:

There's general agreement that we don't necessarily want to make all mutations of the catalog easy (like crazy injection of datasets in the middle of the lifecycle) but maybe there's more ways we can open up the collection of datasets just before the catalog is first instantiated for the rest of the run.

For interactive use on the other hand, building the DataCatalog in an imperative way seems unnecessary and there are other possibilities we can offer https://github.com/kedro-org/kedro/issues/3612#issuecomment-2034961020