Closed noklam closed 3 years ago
I dig into the source code and have more information now.
So Kedro Hook
would actually have direct access of the inputs/outputs. We can learn about it from the NodeSpec
def after_node_run(node, catalog, inputs, outputs, is_async, run_id):
...
The bug is caused by wrong exeuction order.
This can confirm by runner.py
def _run_node_sequential(node: Node, catalog: DataCatalog, run_id: str = None) -> Node:
try:
outputs = node.run(inputs)
except Exception as error:
hook_manager.hook.on_node_error( # pylint: disable=no-member
error=error,
node=node,
catalog=catalog,
inputs=inputs,
is_async=is_async,
run_id=run_id,
)
raise error
hook_manager.hook.after_node_run( # pylint: disable=no-member
node=node,
catalog=catalog,
inputs=inputs,
outputs=outputs,
is_async=is_async,
run_id=run_id,
)
for name, data in outputs.items():
catalog.save(name, data)
return node
Kedro Hook
after_node_run() is designed to be execute before writing out the file, I think it should just read from the callback argument outputs
instead of reading from DataCatalog in this case.
This probably will not impact before_node_run
, but only after_node_run
@tamsanh
Great work! Let me read through the pull request.
Excellent work. Let me know if the problem still exists.
I've published it under version 0.2.7
Thank you, the problem is resolved now.
I have a function
When I use
run_after_node=True
, it will complain about "No such file or directory". Which means the expectation suite is actually executed before the file is output. It only works if the file exist before I dokedro run
. Which mean it would be always checking the old version of data instead of the latest pipeline run.I would assume expection suites can check against these variable directly through callbacks, instead of reading them again from file.