kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.87k stars 893 forks source link

kedro run does not work after single machine deployment #4043

Open jan-kaufmann opened 1 month ago

jan-kaufmann commented 1 month ago

Description

After coding in Pycharm I tried to deploy my code to another machine following the instructions of the git clone workflow: docs

git clone my_repo
cd my_repo
pip install kedro
pip install -r requirements.txt
kedro run

This failed with an exception because kedro was unable to locate my custom Dataset class. Running the same code from within Pycharm works just fine.

I managed to get the code working by using

python -m kedro run

instead. This is not what the documentation suggests. I experienced the same behavior on the development machine and the deployment target. Development is running Windows 10, Deployment is on Fedora 39, kedro 0.19.6

I also tried to activate the python environment I used for development and then running kedro run - same issue.

Documentation page (if applicable)

Context

Error log

kedro run
[07/30/24 12:08:14] INFO     Using `conf/logging.yml` as logging configuration. You can change this by setting the KEDRO_LOGGING_CONFIG environment variable accordingly.     __init__.py:249
                    WARNING  /home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro_viz/__init__.py:13: KedroVizPythonVersionWarning: Please be advised that Kedro Viz  warnings.py:110
                             is not yet fully
                                     compatible with the Python version you are currently using.
                               warnings.warn(

[07/30/24 12:08:16] INFO     Kedro project log-analytics                                                                                                                       session.py:324
                    DEBUG    Registered Ctrl-C handler                                                                                                                            hooks.py:22
                    WARNING  /home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py:432: UserWarning: An error occurred while importing   warnings.py:110
                             the 'log_analytics.pipelines.load_data' module. Nothing defined therein will be returned by 'find_pipelines'.

                             Traceback (most recent call last):
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 152, in from_config
                                 class_obj, config = parse_dataset_definition(
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 405, in parse_dataset_definition
                                 raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
                             kedro.io.core.DatasetError: Class 'src.log_analytics.datatypes.IncrementalLogsDataset' not found, is this a typo?

                             The above exception was the direct cause of the following exception:

                             Traceback (most recent call last):
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py", line 424, in find_pipelines
                                 pipeline_module = importlib.import_module(pipeline_module_name)
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/usr/lib64/python3.12/importlib/__init__.py", line 90, in import_module
                                 return _bootstrap._gcd_import(name[level:], package, level)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
                               File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
                               File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
                               File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
                               File "<frozen importlib._bootstrap_external>", line 995, in exec_module
                               File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
                               File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/load_data/__init__.py", line 1, in <module>
                                 from .pipeline import create_pipeline  # NOQA
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/load_data/pipeline.py", line 4, in <module>
                                 from .nodes import update_dataset
                               File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/load_data/nodes.py", line 12, in <module>
                                 catalog = DataCatalog.from_config(conf_loader['catalog'])
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/data_catalog.py", line 299, in from_config
                                 datasets[ds_name] = AbstractDataset.from_config(
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 156, in from_config
                                 raise DatasetError(
                             kedro.io.core.DatasetError: An exception occurred when parsing config for dataset 'raw_log_data':
                             Class 'src.log_analytics.datatypes.IncrementalLogsDataset' not found, is this a typo?

                               warnings.warn(

                    WARNING  /home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py:432: UserWarning: An error occurred while importing   warnings.py:110
                             the 'log_analytics.pipelines.analyse_data' module. Nothing defined therein will be returned by 'find_pipelines'.

                             Traceback (most recent call last):
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 152, in from_config
                                 class_obj, config = parse_dataset_definition(
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 405, in parse_dataset_definition
                                 raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
                             kedro.io.core.DatasetError: Class 'src.log_analytics.datatypes.IncrementalLogsDataset' not found, is this a typo?

                             The above exception was the direct cause of the following exception:

                             Traceback (most recent call last):
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py", line 424, in find_pipelines
                                 pipeline_module = importlib.import_module(pipeline_module_name)
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/usr/lib64/python3.12/importlib/__init__.py", line 90, in import_module
                                 return _bootstrap._gcd_import(name[level:], package, level)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
                               File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
                               File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
                               File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
                               File "<frozen importlib._bootstrap_external>", line 995, in exec_module
                               File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
                               File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/analyse_data/__init__.py", line 1, in <module>
                                 from .pipeline import create_pipeline  # NOQA
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/analyse_data/pipeline.py", line 4, in <module>
                                 from .nodes import portscan_analysis, report_upload
                               File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/analyse_data/nodes.py", line 12, in <module>
                                 catalog = DataCatalog.from_config(conf_loader['catalog'])
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/data_catalog.py", line 299, in from_config
                                 datasets[ds_name] = AbstractDataset.from_config(
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 156, in from_config
                                 raise DatasetError(
                             kedro.io.core.DatasetError: An exception occurred when parsing config for dataset 'raw_log_data':
                             Class 'src.log_analytics.datatypes.IncrementalLogsDataset' not found, is this a typo?

                               warnings.warn(

Traceback (most recent call last):
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/session/session.py", line 341, in run
    pipeline = pipelines[name]
               ~~~~~~~~~^^^^^^
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py", line 142, in inner
    self._load_data()
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py", line 187, in _load_data
    project_pipelines = register_pipelines()
                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipeline_registry.py", line 16, in register_pipelines
    pipelines["load"] = pipelines['load_data']
                        ~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'load_data'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jan.kaufmann/.local/bin/kedro", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/cli/cli.py", line 233, in main
    cli_collection()
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/cli/cli.py", line 130, in main
    super().main(
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/cli/project.py", line 225, in run
    session.run(
  File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/session/session.py", line 343, in run
    raise ValueError(
ValueError: Failed to find the pipeline named '__default__'. It needs to be generated and returned by the 'register_pipelines' function.
ankatiyar commented 1 month ago

Hey @jan-kaufmann, thanks for reporting this. Could you check if the folder which contains your custom dataset has an __init__.py file. It might be that?