Open astrojuanlu opened 1 year ago
I thought I have created a ticket for this but I didnt. Thanks for creating this!
We need to explain WHY user need to do this, and examples of how to do it.
https://github.com/Galileo-Galilei/kedro-mlflow/blob/845ad919c9dbd020e948e8adc2e0f9064de1ef68/kedro_mlflow/framework/hooks/mlflow_hook.py#L50-L63 is a good example. This get asked in the intermediate training, so maybe we can create an example that showcase this.
Just today I wanted to apply such an "advanced" hook use case: storing the catalog and then injecting datasets on the fly. However, it doesn't work:
class MissingDatasetHooks:
@hook_impl
def after_catalog_created(self, catalog: DataCatalog):
self._catalog = catalog
@hook_impl
def before_dataset_loaded(self, dataset_name):
dataset = self._catalog._get_dataset(dataset_name)
try:
dataset.load()
except DataSetError:
# Create EmptyDataset on the fly
logger.warning("Attempted to load dataset %s which doesn't exist yet, injecting it", dataset_name)
missing_dataset = MissingDataSet(dataset=dataset)
self._catalog.add(dataset_name, missing_dataset, replace=True)
the self._catalog
that gets saved receives the .add(..., replace=True)
correctly, but the catalog.load
that comes immediately after the before_dataset_loaded
hook still has the old dataset:
Context: I was trying to give a workaround for https://stackoverflow.com/q/76557758/554319.
Is this behavior expected?
(Using after_context_created
gets the same result)
I've seen a couple of references to before_pipeline_created
, both here and in discord, is this a hook which is available? I can't find reference to it anywhere in the docs.
Hmmm actually, I'm not sure it ever existed, maybe it's a typo? Does before_pipeline_run
or after_catalog_created
suit your needs?
It does, but it was introduced in 0.18.1 and I am on an earlier version. Granted, I am building the hooks to ease our transition to 0.18+, but if there was a hook already implemented which offered a similar API to test the functionality without needing to actually upgrade our project it would've been quicker to test.
Thanks
Hooks were introduced in 0.16.0 (cc75a1c7fdea6660b987aecd4b99bdd6234187ce), and a few of them later on. Here's the list of hooks in 0.16.6 for example
https://docs.kedro.org/en/0.16.6/07_extend_kedro/04_hooks.html#execution-timeline-hooks
Description
Hooks are stateful objects, which enables users to, for example, store the context in the
after_context_created
hook and use it later in a hook that doesn't receive it:(code sample by @antonymilne )
We should better document this.
Context
Follow up from gh-506 and other discussions, see for example https://www.linen.dev/s/kedro/t/12112601/does-anyone-know-if-there-is-a-reason-why-we-could-not-pass-#4e36a67f-36d3-4354-a3d5-4347a59ef28f
This is very useful when migrating projects from older versions of Kedro, customize pipeline execution, and more.