Closed tzdanowicz closed 3 months ago
Hi @tzdanowicz, thank you for creating this issue. This sounds like a bug on our end, so I'll make sure we address it and put it on our backlog.
@tzdanowicz this has been fixed in #3950 and will be released in our next release 0.19.7
Description
kedro runner calls data catalog
shallow_copy
that always return newDataCatalog
object type, destroying anyCustomDataCatalog
object being copied.Context
I created custom
PickleDataCatalog
class in order to handle dynamically multiple pickle objects onafter_node_run
hook event. My custom datacatalog and hooks were properly set insrc/my_project_name/settings.py
Turned out that preset
DATA_CATALOG_CLASS
reference is lost during pipeline lifecycle and thecatalog
param isDataCatalog
type instead of expected customPickleDataCatalog
Steps to Reproduce
1) create simple custom
PickleDataCatalog
class2) create
hooks.py
with customModelSavingHook
handler, to properly maintain files:3) update
src/my_project_name/settings.py
4)
kedro run
Expected Result
While investigating pipeline lifecycle, i can see that custom DataCatalog is properly propagated among following events:
after_catalog_created
before_pipeline_run
Unfortunatelly it is lost on
before_node_run
andafter_node_run
, expected :Actual Result
Received
catalog
object onbefore_node_run
is different than onafter_catalog_created
andbefore_pipeline_run
! instead of expectedPickleDataCatalog
there is passed defaultkedro.io.DataCatalog
(checkid(catalog)}
on all events)Cause of a problem
Turned out in
kedro.runner
there is customshallow_copy
call on catalogwhere
In order to fix that bug we need to use the actual type of
catalog
object - which would be:Your Environment
pip show kedro
orkedro -V
): kedro, version 0.19.5python -V
): Python 3.9.17