Closed melvinkokxw closed 1 year ago
@melvinkokxw Thanks for reporting this. In a kedro run, the catalog is created by the context so context hook is the earliest triggered hook.
Can you share more how's your hook look like and do you still see the same issue when removed telemetry?
@noklam Here is the hook I'm using:
class MLFlowHooks:
"""
Initialise MLFlow and log useful tracking info automatically
"""
@hook_impl
def after_context_created(self, context):
"""Loads MLFlow catalog and parameters if MLFlow is enabled
Args:
context: Kedro context
"""
if not self.mlflow_enabled:
return
context.config_loader["parameters"] = {
**context.config_loader["parameters"],
**context.config_loader["mlflow_parameters"],
}
context.config_loader["catalog"] = {
**context.config_loader["catalog"],
**context.config_loader["mlflow_catalog"],
}
@hook_impl
def after_catalog_created(self, feed_dict):
"""
Check if MLFLow is enabled
Args:
feed_dict: Kedro feed dict
Returns:
"""
parameters = feed_dict["parameters"]
# MLFlow enabled?
self.mlflow_enabled = parameters.get("mlflow_enabled", False)
When I removed kedro-telemetry
, after_context_created
was triggered first. When I reinstall kedro-telemetry
, after_catalog_created
was triggered first.
Some debugging:
after_context_created
, the one supposed to be the earliest, is called in KedroSession.load_context
:
after_catalog_created
is called by either accessing KedroContext.catalog
or calling KedroSession.run
:
Note that KedroSession.run
calls .load_context()
(hence triggering after_context_created
) before triggering after_catalog_created
.
However, kedro-telemetry
hooks on before_command_run
, which is a CLI hook that runs earlier:
Working on gh-1942 to better understand what's happening.
I think this was answer in slack but I can't find the thread now.
What happen is this - catalog is a read-only object, everytime context.catalog get called it get created and trigger the after_catalog_created hook. In the telemetry hook after_context_created it created catalog, so it trigger the after_catalog_created before your MLFlowHook’s after_context_created
@melvinkokxw I am closing this for now. The short answer to this is the execution orders are not clear when you have multiple hooks exist. In this case, kedro-telemetry
try to read catalog
so it triggers after_catalog_created
.
What happens roughly
kedro-telemtry
after_context_created
read catalog
after_catalog_created
is triggeredafter_context_created
is triggered twice in your custom hook.So your hook isn't really the "first hook". We need to document this better but this is not a bug.
Related issues:
This goes a bit deeper into the design of KedroContext
but maybe, given that it's immutable in develop
(#1465), .catalog
could be pre-computed, hence after_catalog_created
would always be triggered before after_context_created
, reducing ambiguity.
Description
after_context_created
should be the earliest hook triggered (as per Kedro's documentation) butafter_catalog_created
is triggered before it.When I uninstalled
kedro-telemetry
(to fix a separate issue), the hooks are triggered in the expected order.Raising the bug here instead of the
kedro-plugins
repo as I cannot confirm ifkedro-telemetry
causes the bugContext
I was doing a
kedro run
and trying to read parameters in a hook, expecting theafter_context_created
hook to be triggered first. Instead,after_catalog_created
was triggered before it.Steps to Reproduce
settings.py
kedro run
Expected Result
after_context_created
hook should be the earliest hook to be triggeredActual Result
after_catalog_created
is triggered before theafter_context_created
hookYour Environment
Include as many relevant details about the environment in which you experienced the bug:
pip show kedro
orkedro -V
): 0.18.7python -V
): 3.8.16settings.py
looks like this: