Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
After coding in Pycharm I tried to deploy my code to another machine following the instructions of the git clone workflow: docs
git clone my_repo
cd my_repo
pip install kedro
pip install -r requirements.txt
kedro run
This failed with an exception because kedro was unable to locate my custom Dataset class. Running the same code from within Pycharm works just fine.
I managed to get the code working by using
python -m kedro run
instead. This is not what the documentation suggests.
I experienced the same behavior on the development machine and the deployment target. Development is running Windows 10, Deployment is on Fedora 39, kedro 0.19.6
I also tried to activate the python environment I used for development and then running kedro run - same issue.
Documentation page (if applicable)
Context
Error log
kedro run
[07/30/24 12:08:14] INFO Using `conf/logging.yml` as logging configuration. You can change this by setting the KEDRO_LOGGING_CONFIG environment variable accordingly. __init__.py:249
WARNING /home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro_viz/__init__.py:13: KedroVizPythonVersionWarning: Please be advised that Kedro Viz warnings.py:110
is not yet fully
compatible with the Python version you are currently using.
warnings.warn(
[07/30/24 12:08:16] INFO Kedro project log-analytics session.py:324
DEBUG Registered Ctrl-C handler hooks.py:22
WARNING /home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py:432: UserWarning: An error occurred while importing warnings.py:110
the 'log_analytics.pipelines.load_data' module. Nothing defined therein will be returned by 'find_pipelines'.
Traceback (most recent call last):
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 152, in from_config
class_obj, config = parse_dataset_definition(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 405, in parse_dataset_definition
raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
kedro.io.core.DatasetError: Class 'src.log_analytics.datatypes.IncrementalLogsDataset' not found, is this a typo?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py", line 424, in find_pipelines
pipeline_module = importlib.import_module(pipeline_module_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/load_data/__init__.py", line 1, in <module>
from .pipeline import create_pipeline # NOQA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/load_data/pipeline.py", line 4, in <module>
from .nodes import update_dataset
File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/load_data/nodes.py", line 12, in <module>
catalog = DataCatalog.from_config(conf_loader['catalog'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/data_catalog.py", line 299, in from_config
datasets[ds_name] = AbstractDataset.from_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 156, in from_config
raise DatasetError(
kedro.io.core.DatasetError: An exception occurred when parsing config for dataset 'raw_log_data':
Class 'src.log_analytics.datatypes.IncrementalLogsDataset' not found, is this a typo?
warnings.warn(
WARNING /home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py:432: UserWarning: An error occurred while importing warnings.py:110
the 'log_analytics.pipelines.analyse_data' module. Nothing defined therein will be returned by 'find_pipelines'.
Traceback (most recent call last):
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 152, in from_config
class_obj, config = parse_dataset_definition(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 405, in parse_dataset_definition
raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
kedro.io.core.DatasetError: Class 'src.log_analytics.datatypes.IncrementalLogsDataset' not found, is this a typo?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py", line 424, in find_pipelines
pipeline_module = importlib.import_module(pipeline_module_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/analyse_data/__init__.py", line 1, in <module>
from .pipeline import create_pipeline # NOQA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/analyse_data/pipeline.py", line 4, in <module>
from .nodes import portscan_analysis, report_upload
File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipelines/analyse_data/nodes.py", line 12, in <module>
catalog = DataCatalog.from_config(conf_loader['catalog'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/data_catalog.py", line 299, in from_config
datasets[ds_name] = AbstractDataset.from_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/io/core.py", line 156, in from_config
raise DatasetError(
kedro.io.core.DatasetError: An exception occurred when parsing config for dataset 'raw_log_data':
Class 'src.log_analytics.datatypes.IncrementalLogsDataset' not found, is this a typo?
warnings.warn(
Traceback (most recent call last):
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/session/session.py", line 341, in run
pipeline = pipelines[name]
~~~~~~~~~^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py", line 142, in inner
self._load_data()
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/project/__init__.py", line 187, in _load_data
project_pipelines = register_pipelines()
^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/grafana_log_analytics/log-analytics/src/log_analytics/pipeline_registry.py", line 16, in register_pipelines
pipelines["load"] = pipelines['load_data']
~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'load_data'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jan.kaufmann/.local/bin/kedro", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/cli/cli.py", line 233, in main
cli_collection()
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/cli/cli.py", line 130, in main
super().main(
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/cli/project.py", line 225, in run
session.run(
File "/home/jan.kaufmann/.local/lib/python3.12/site-packages/kedro/framework/session/session.py", line 343, in run
raise ValueError(
ValueError: Failed to find the pipeline named '__default__'. It needs to be generated and returned by the 'register_pipelines' function.
Hey @jan-kaufmann, thanks for reporting this. Could you check if the folder which contains your custom dataset has an __init__.py file. It might be that?
Description
After coding in Pycharm I tried to deploy my code to another machine following the instructions of the
git clone
workflow: docsThis failed with an exception because kedro was unable to locate my custom Dataset class. Running the same code from within Pycharm works just fine.
I managed to get the code working by using
instead. This is not what the documentation suggests. I experienced the same behavior on the development machine and the deployment target. Development is running Windows 10, Deployment is on Fedora 39, kedro 0.19.6
I also tried to activate the python environment I used for development and then running
kedro run
- same issue.Documentation page (if applicable)
Context
Error log