kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 893 forks source link

Importing `kedro.runners` enables Rich logging #3985

Open astrojuanlu opened 3 months ago

astrojuanlu commented 3 months ago

Description

image

Context

Steps to Reproduce

  1. Do a logging.warning("ANY MESSAGE") on a notebook to see that it's formatted as plain text
  2. import kedro.runners
  3. Repeat logging.warning("ANY MESSAGE") and observe the formatting has changed

Expected Result

Actual Result

I didn't expect imports to have such side effects.

Also, I don't think there's a way to revert it: https://github.com/Textualize/rich/issues/2461 so I need to resort to hacks.

merelcht commented 2 months ago

I've done some digging and the issue at first seems to come from the import:from kedro.framework.hooks.manager import _NullPluginManager. But then I moved that out and I got some more info when doing import kedro.runner:

[07/05/24 14:50:03] INFO     Using                                                                  [__init__.py](file:///Users/Merel_Theisen/anaconda3/envs/kedro/lib/python3.11/site-packages/kedro/framework/project/__init__.py):[246](file:///Users/Merel_Theisen/anaconda3/envs/kedro/lib/python3.11/site-packages/kedro/framework/project/__init__.py#246)
                             '/Users/Merel_Theisen/anaconda3/envs/kedro/lib/python3.11/site-package                
                             s/kedro/framework/project/rich_logging.yml' as logging configuration.

So it seems that logging is configured when you import the runner. I've also checked that this doesn't happen when importing e.g. DataCatalog or the configloaders.

It seems that this comes from the ParallelRunner ~where we directly import LOGGING which triggers the logging configuration~ I tested this by removing ParallelRunner completely and the references to it in kedro.runner.__init__.py and can see that rich doesn't get triggered when it's all removed.

It looks like it comes from: https://github.com/kedro-org/kedro/blob/a179f87f0f170075ed3694d8a154937a9a96254c/kedro/runner/parallel_runner.py#L19-L24

Which flows into: https://github.com/kedro-org/kedro/blob/a179f87f0f170075ed3694d8a154937a9a96254c/kedro/framework/hooks/specs.py#L9

to: https://github.com/kedro-org/kedro/blob/a179f87f0f170075ed3694d8a154937a9a96254c/kedro/framework/context/context.py#L16

which seems to lead to: https://github.com/kedro-org/kedro/blob/a179f87f0f170075ed3694d8a154937a9a96254c/kedro/framework/project/__init__.py#L266

So it turns out it is both the import to _NullPluginManager as well as the imports inside ParallelRunner to hooks and project settings. Eventually it all boils down to the logging being configured when project settings are imported.