Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
As for DataCatalog, the most important thing is to test it within the pipeline, CLI and separately by simulating scenarios when calling some methods as (add_feed_dict). Where the tests themselves should include different sets and combinations of parameters, datasets and patterns. https://github.com/kedro-org/kedro/issues/3957#issuecomment-2299105368
The tickets combined the two components as I believe the setup will be similar, or at least there need to be some mechanism to setup different datasets first so I thought it makes sense to bundle these two components together.
The main goal for this is to benchmark performance of individual component, this will inform if refactoring work has positive/negative impact. Currently we only check if test pass, so we have no idea if a change may slow down performance. We have done this in the past but usually ad-hoc basis, we should run this regularly (or at least per release).
The direction of this is simple, we want to make measure the change of time against # number of entries. We would start with Datasets and Catalog, as this fits in the DataCatalog2.0 work and will be immediately useful.
DataCatalog (test # of datasets with catalog.yml & dataset factory)
ConfigLoader(# of parameters)
Optional: pipelines generated in loops (Dynamic pipeline)
Description
To create stress test for individual components:
The tickets combined the two components as I believe the setup will be similar, or at least there need to be some mechanism to setup different datasets first so I thought it makes sense to bundle these two components together.
Context
https://github.com/kedro-org/kedro/issues/3957#issuecomment-2298861806