kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
10.02k stars 906 forks source link

Improve OmegaConfigLoader performance when global/variable interpolations are involved #4322

Open ravi-kumar-pilla opened 1 week ago

ravi-kumar-pilla commented 1 week ago

Description

Extending on the investigation in #3893 , OmegaConfigLoader lags in resolving catalog configurations when global/variable interpolations are involved.

Context

Some previous observations: https://github.com/kedro-org/kedro/issues/3893#issuecomment-2460629752 https://github.com/kedro-org/kedro/issues/3893#issuecomment-2465507178

Steps to Reproduce

Run stress test which creates a single catalog with variable interpolations - https://github.com/kedro-org/kedro/blob/test/ocl-bottleneck/kedro_benchmarks/temp_investigate_ocl/ocl_plot_variables.py

Expected Result

Reduce the time spent on below methods (when interpolations are involved, the bottleneck seems to be OmegaConf.to_container)

Actual Result

https://github.com/kedro-org/kedro/issues/3893#issuecomment-2465507178

Your Environment

image
noklam commented 5 days ago

During the investigation, I found that there was a slow bit in _set_globals_value. I didn't spent enough time to fix it, but with a quick fix it improves roughly from 1.5s -> 0.9s, but there are probably more.

noklam commented 5 days ago

Particularly, there is an obvious slow path global_oc get created and destroyed for every reference of $globals

https://github.com/kedro-org/kedro/blob/01c095b2537c1cfbafe6443a021e60ec94bfa1af/kedro/config/omegaconf_config.py#L433-L439