kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.82k stars 895 forks source link

"Hello world" example from the docs doesn't work on Kedro 0.18 #1409

Closed szczeles closed 2 years ago

szczeles commented 2 years ago

Description

Following "Hello World" example from the docs on Kedro 0.18.0 fails with missing 1 required positional argument: 'hook_manager' for runner.run()

Context

Completing quickstart tutorial

Steps to Reproduce

  1. pip install kedro==0.18.0
  2. Follow https://kedro.readthedocs.io/en/0.18.0/get_started/hello_kedro.html#hello-kedro

Expected Result

This is how it behaves on 0.17.7:

$ python hello_kedro.py 
/opt/conda/envs/kedro-new/lib/python3.8/site-packages/kedro/io/data_catalog.py:189: DeprecationWarning: The transformer API will be deprecated in Kedro 0.18.0.Please use Dataset Hooks to customise the load and save methods.For more information, please visithttps://kedro.readthedocs.io/en/stable/07_extend_kedro/02_hooks.html
  warnings.warn(
{'my_message': 'Hello Kedro!'}

Actual Result

On 0.18.0:

$ python hello_kedro.py 
Traceback (most recent call last):
  File "hello_kedro.py", line 32, in <module>
    print(runner.run(greeting_pipeline, data_catalog))
TypeError: run() missing 1 required positional argument: 'hook_manager'

Your Environment

noklam commented 2 years ago

Thank you for reporting this, we will look into it.

noklam commented 2 years ago

Related Issue: https://github.com/kedro-org/kedro-training/issues/24#issuecomment-1087753852

katyalaitqb commented 2 years ago

Is there a quick workaround?

szczeles commented 2 years ago

@katyalaitqb pip install kedro<0.18.0 helps, as this API change was introduced with 0.18

sebaxtian commented 2 years ago

Hi @szczeles, I tried this on the hello kedro example in order to solve the issue, I'm not sure that is the right solution but its works for me

"""Contents of hello_kedro.py"""
from kedro.io import DataCatalog, MemoryDataSet
from kedro.pipeline import node, pipeline
from kedro.runner import SequentialRunner
from kedro.framework.session import KedroSession

# Prepare a data catalog
data_catalog = DataCatalog({"my_salutation": MemoryDataSet()})

# Prepare first node
def return_greeting():
    return "Hello"

return_greeting_node = node(return_greeting, inputs=None, outputs="my_salutation")

# Prepare second node
def join_statements(greeting):
    return f"{greeting} Kedro!"

join_statements_node = node(
    join_statements, inputs="my_salutation", outputs="my_message"
)

# Assemble nodes into a pipeline
greeting_pipeline = pipeline([return_greeting_node, join_statements_node])

# Create a runner to run the pipeline
runner = SequentialRunner()

# HERE: this section --->
# Added the PluginManager hook_manager argument to KedroContext and the Runner.run() method,
# which will be provided by the KedroSession.
hook_manager = KedroSession('kedro-hello-world')._hook_manager
# <--- HERE: this section 

# Run the pipeline
print(runner.run(greeting_pipeline, data_catalog, hook_manager))

Please read the Breaking changes to the API section on the release https://github.com/kedro-org/kedro/releases/tag/0.18.0

I added this to the hello world example code:

# Added the PluginManager hook_manager argument to KedroContext and the Runner.run() method,
# which will be provided by the KedroSession.
hook_manager = KedroSession('kedro-hello-world')._hook_manager

Please, give me a feedback, thanks

szczeles commented 2 years ago

@sebaxtian Good catch! You're right, getting hook manages from KedroSession's private vars is the easiest way to solve it. The other solution would be to create hook manager directly:

from pluggy import PluginManager
hook_manager = PluginManager("kedro")

But, this hook manager will not be equipped with any hooks coming from plugins, etc...

antonymilne commented 2 years ago

Just for the purposes of getting a running tutorial, I'd recommend using _create_hook_manager (which is what's called inside KedroSession) as the least bad option:

from kedro.framework.session.session import _create_hook_manager
print(runner.run(greeting_pipeline, data_catalog, _create_hook_manager())

(Solution courtesy of @noklam)

The minor advantage of this compared to what @sebaxtian suggests is that you don't need to create a KedroSession.

However, this is obviously a horrible solution and we will figure out a better one. Basically calling runner.run is a very unusual thing to do - we only do it in this Hello world tutorial, and in general you would programatically call a kedro run using session.run like this. When we made the changes to 0.18 we didn't realise that this would make calling runner.run in the Hello world tutorial so much more awkward and ugly, sorry!