kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
672 stars 110 forks source link

Viz hook is broken with ParallelRunner [Blocked by Framework] #1801

Open noklam opened 6 months ago

noklam commented 6 months ago

Description

Short description of the problem here. image

Context

How has this bug affected you? What were you trying to accomplish?

Left: ParallelRunner Right: SequentialRunner

Steps to Reproduce

create a new project and run kedro run --runner=ParallelRunner

Expected Result

Tell us what should happen. No warnings from viz

Actual Result

Tell us what happens instead. warnings datasets does not exist

-- If you received an error, place it here.
-- Separate them if you have more than one.

Your Environment

Include as many relevant details as possible about the environment you experienced the bug in:

Checklist

rashidakanchwala commented 6 months ago

This is very similar to #1797

noklam commented 6 months ago

leave a comment here, this is a specific case for multiprocessing, thus ParallelRunner is affected. The problem is fundamentally the hook is not a Process/ThreadSafe implementation so it is broken when used together.

astrojuanlu commented 6 months ago

The problem is fundamentally the hook is not a Process/ThreadSafe implementation so it is broken when used together

To clarify, is this a fundamental limitation of pluggy, the way we implement our hooks, or Viz hook specifically?

noklam commented 6 months ago

I don't think it's a pluggy specific problem, it's more you simply cannot implementing a random class and expect it works in multiprocessing directly. See ParallelRunner and SharedMemoryDataset .

So I'd say it's a hook implementation problem, but it's also a general case because I think most kedro plugins would break with ParallelRunner. Maybe there is a nice way to make it work across all plugins. i.e. like a AbstractHook class. I had some discussion with @merelcht before, and I think ParallelRunner is less important that I thought before.

So it's an interesting problem, we should probably fix it in kedro-viz since it's a first party plugin, but I don't know if we need a generic solution.

See also:

(edited: or Hey! Let's wait for GIL removal and pray Python work well with multiprocessing in the future)

ravi-kumar-pilla commented 6 months ago

Hi @noklam ,

Thanks for raising the issue. In the steps to reproduce -

create a new project and run - Do you have any starter project where we can run the pipeline completely using kedro run --runner=ParallelRunner ? I just tested with spaceflights-pandas and spaceflights-pandas-viz with disabling the kedro viz hooks completely using settings.py DISABLE_HOOKS_FOR_PLUGINS = ("kedro-viz",). Both were failing to complete kedro run - This might not be a blocker to resolve the warning but I would like to know if there is some starter available for ParallelRunner.

                   INFO     Running node: train_model_node: train_model([X_train;y_train]) -> [regressor]                                                                                                                                node.py:340
                    ERROR    Node train_model_node: train_model([X_train;y_train]) ->  failed with error:                                                                                                                                 node.py:365
                             cannot set WRITEABLE flag to True of this array                                                                                                                                                                         
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/parallel_runner.py", line 91, in _run_node_synchronization
    return run_node(node, catalog, hook_manager, is_async, session_id)
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 331, in run_node
    node = _run_node_sequential(node, catalog, hook_manager, session_id)
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 424, in _run_node_sequential
    outputs = _call_node_run(
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 390, in _call_node_run
    raise exc
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 380, in _call_node_run
    outputs = node.run(inputs)
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/pipeline/node.py", line 371, in run
    raise exc
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/pipeline/node.py", line 357, in run
    outputs = self._run_with_list(inputs, self._inputs)
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/pipeline/node.py", line 402, in _run_with_list
    return self._func(*(inputs[item] for item in node_inputs))
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/spaceflights-pandas/src/spaceflights_pandas/pipelines/data_science/nodes.py", line 38, in train_model
    regressor.fit(X_train, y_train)
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/linear_model/_base.py", line 578, in fit
    X, y = self._validate_data(
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/base.py", line 650, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1279, in check_X_y
    y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1289, in _check_y
    y = check_array(
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1097, in check_array
    array.flags.writeable = True
ValueError: cannot set WRITEABLE flag to True of this array
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/bin/kedro", line 8, in <module>
    sys.exit(main())
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/framework/cli/cli.py", line 233, in main
    cli_collection()
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/framework/cli/cli.py", line 130, in main
    super().main(
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/framework/cli/project.py", line 225, in run
    session.run(
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/framework/session/session.py", line 395, in run
    run_result = runner.run(
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 117, in run
    self._run(pipeline, catalog, hook_or_null_manager, session_id)  # type: ignore[arg-type]
  File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/parallel_runner.py", line 314, in _run
    node = future.result()
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/concurrent/futures/_base.py", line 433, in result
    return self.__get_result()
  File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
ValueError: cannot set WRITEABLE flag to True of this array
noklam commented 6 months ago

There is no specific starter, it should work with any of it. I believe the CI also run this as an end to end test.

This maybe a scikit learn problem, can you try downgrade the library?

ravi-kumar-pilla commented 6 months ago

Hi @noklam

As discussed, I will be moving this ticket to backlog, as we cannot access the SyncManager instance from the hooks to register a shared dict with the manager that is started with ParallelRunner. So, we need some way of exposing the manager (either through the catalog or runner in Kedro) and make it mutable for the custom hooks.

Note: For now, the DatasetStatsHook in Kedro-Viz works for Sequential Runner.

Thank you

astrojuanlu commented 6 months ago

Opened a discussion about that https://github.com/kedro-org/kedro/discussions/3776

merelcht commented 1 month ago

We discussed a similar ticket in the framework grooming (https://github.com/kedro-org/kedro/issues/4078). We decided that it requires more investigation on the Framework side. For the time being it was suggested we can lower the logging level to DEBUG and add a note in the docs that ParallelRunner doesn't work with the viz hook. cc: @rashidakanchwala