kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 877 forks source link

Default and Custom Pipeline not being registered and cannot be found. #731

Closed Cazforshort closed 3 years ago

Cazforshort commented 3 years ago

Description

Kedro Run doesn't work and claims that I need to register my pipeline.

File "C:\Anaconda3\envs\kedro-environment\lib\site-packages\kedro\framework\context\context.py", line 310, in _get_pipeline ) from exc kedro.framework.context.context.KedroContextError: Failed to find the pipeline named 'de'. It needs to be generated and returned by the 'register_pipelines' function.

I certainly have it registered. Here is my My src\dcs_package\pipeline_registry.py:

` from typing import Dict

from kedro.pipeline import Pipeline, node from .pipelines.data_processing.pipeline import create_pipeline import logging

def register_pipelines() -> Dict[str, Pipeline]:

  log = logging.getLogger(__name__)
  log.info("Start register_pipelines") 
  data_processing_pipeline = create_pipeline()
  log.info("create pipeline done") 

  return {
      "__default__": data_processing_pipeline,
      "dp": data_processing_pipeline
  }` 

and my pipeline file is in "src\dcs_package\pipelines\data_processing\pipeline.py"

Context

I'm trying to run a very simple pipeline that just outputs a test string "test string"

Steps to Reproduce

  1. Did Kedro install[all]
  2. Set up catalog file with a csv and an xlsx to make sure dependancies were working. No problems there.
  3. Tried kedro run and kedro run --pipeline de. Same response

    Expected Result

    Pipeline is found and runs node.

Actual Result

Pipeline is not found. "Failed to find the pipeline named 'de'. It needs to be generated and returned by the 'register_pipelines' function."

-- If you received an error, place it here.

Failed to find the pipeline named 'de'. It needs to be generated and returned by the 'register_pipelines' function.

-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

kedro allows teams to create analytics projects. It is developed as part of the Kedro initiative at QuantumBlack.

No plugins installed

datajoely commented 3 years ago

Hi @Cazforshort as replied on StackOverflow:

There is a bit of a problem with kedro 0.17.2 where the true error is masked and will return the exception that you're seeing instead. It's possible that the root cause of the error is actually some other ModuleNotFoundError or AttributeError. Try doing a kedro install before kedro run and see if that fixes it.

https://stackoverflow.com/questions/66751310/kedro-failed-to-find-the-pipeline-named-default/66751440#66751440

Could you let us know if this resolves the problem?

Cazforshort commented 3 years ago

Hello, I'll reply here. This was a new kedro project. No luck doing kedro install.

(kedro-environment) C:\Users\cc667216\OneDrive\DCS_Pipeline\dcs_files>kedro install Requirements installed!

kedro.framework.context.context.KedroContextError: Failed to find the pipeline named 'default'. (double underscore is evidently bold). It needs to be generated and returned by the 'register_pipelines' function.

Cazforshort commented 3 years ago

Noticed thatkedro pipeline list only returns

[]

So maybe that means something isn't in the right place.

Do print statements work in kedro? That Logging that I'm doing in pipeline_registry doesn't seem to be doing anything.

datajoely commented 3 years ago

Hi @Cazforshort an important question - is this a brand new project or one that you've migrated to 0.17.2?

Cazforshort commented 3 years ago

Hi @Cazforshort an important question - is this a brand new project or one that you've migrated to 0.17.2?

This is a brand new project.

datajoely commented 3 years ago

Okay - great and did you use a kedro starter or just kedro new?

Cazforshort commented 3 years ago

Just kedro new. Then I basically just followed the tutorial for file placement. Maybe some changes to the import statements and running kedro install[all] or something like that to fix the catalog.

datajoely commented 3 years ago

Okay let me recreate the steps on a Windows machine to see if I can recreate

Cazforshort commented 3 years ago

Okay let me recreate the steps on a Windows machine to see if I can recreate

Sure thing, is there anything I can do to test parts since It doesn't seem like my register_pipelines function is running.

Cazforshort commented 3 years ago

Okay let me recreate the steps on a Windows machine to see if I can recreate

Maybe unrelated, but the starter projects aren't working either. Followed literally step by step here

and I get a permissions error even though I'm running as admin. PermissionError: [WinError 5] Access is denied:

(temp) C:\Users\cc667216\OneDrive\DCS_Pipeline_Starter>kedro new -s pandas-iris --verbose

Project Name:
=============
Please enter a human readable name for your new project.
Spaces and punctuation are allowed.
 [New Kedro Project]: DCS Starter

Repository Name:
================
Please enter a directory name for your new project repository.
Alphanumeric characters, hyphens and underscores are allowed.
Lowercase is recommended.
 [dcs-starter]: dcs_dir

Python Package Name:
====================
Please enter a valid Python package name for your project package.
Alphanumeric characters and underscores are allowed.
Lowercase is recommended. Package name must start with a letter
or underscore.
 [dcs_starter]: dcs_pkg
Traceback (most recent call last):
  File "c:\anaconda3\envs\temp\lib\site-packages\kedro\framework\cli\starters.py", line 214, in _create_project
    config = _prompt_user_for_config(template_path, checkout, directory)
  File "c:\anaconda3\envs\temp\lib\site-packages\kedro\framework\cli\starters.py", line 299, in _prompt_user_for_config
    return config
  File "c:\anaconda3\envs\temp\lib\tempfile.py", line 807, in __exit__
    self.cleanup()
  File "c:\anaconda3\envs\temp\lib\tempfile.py", line 811, in cleanup
    _shutil.rmtree(self.name)
  File "c:\anaconda3\envs\temp\lib\shutil.py", line 516, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "c:\anaconda3\envs\temp\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "c:\anaconda3\envs\temp\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "c:\anaconda3\envs\temp\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  [Previous line repeated 1 more time]
  File "c:\anaconda3\envs\temp\lib\shutil.py", line 400, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "c:\anaconda3\envs\temp\lib\shutil.py", line 398, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\cc667216\\AppData\\Local\\Temp\\tmpx9jz7iva\\kedro-starters\\.git\\objects\\pack\\pack-59621c2096146b4b6cb73362f31dea9e23fc56d9.idx'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\anaconda3\envs\temp\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\anaconda3\envs\temp\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\anaconda3\envs\temp\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\anaconda3\envs\temp\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\anaconda3\envs\temp\lib\site-packages\kedro\framework\cli\starters.py", line 157, in new
    directory=directory,
  File "c:\anaconda3\envs\temp\lib\site-packages\kedro\framework\cli\starters.py", line 246, in _create_project
    raise KedroCliError("Failed to generate project.") from exc
kedro.framework.cli.utils.KedroCliError: Failed to generate project.
Error: Failed to generate project.
datajoely commented 3 years ago

@Cazforshort is it possible to upgrade to Python 3.8?

Cazforshort commented 3 years ago

@Cazforshort is it possible to upgrade to Python 3.8?

I was able to successfully install and run the Iris starter after updating to Python 3.8. 8! I'll try replacing everything with my own project files in my 3.8.8 environment now.

datajoely commented 3 years ago

Glad you managed to get going! Let me know if you run into any other roadblocks and I'll investigate what's going on Python <3.8 properly!

Cazforshort commented 3 years ago

Okay so bad news is the exact same issue happened again. Can't find the pipelines even though they are registered. Good news is I figured out the exact line that is breaking it. It was an import from one of my sql_tools.py file that goes into a node.

from sqlalchemy import create_engine, text

I think sqlalchemy is not being installed right and for some reason it throws that very confusing error. How do I make sure its added to the requirements file correctly?

datajoely commented 3 years ago

Wow good work! I'll raise a ticket for this since we should absolutely give you a good error message.

I'm not sure I understand this part

I think sqlalchemy is not being installed right and for some reason it throws that very confusing error. How do I make sure its added to the requirements file correctly?

What are you doing currently?

datajoely commented 3 years ago

Actually I can confirm a fix for this misleading error message is queued up for 0.17.3

Cazforshort commented 3 years ago

It may actually just be the code that is using it. Definitely in that area. I'll keep testing until I figure out what exact line is causing the problem.

`def execute_sql(sql_query,credentials_file):

#connect
with open(credentials_file) as file:
    credentials = yaml.load(file, Loader=yaml.FullLoader)

engine = create_engine(credentials['features_credentials']['con'])

sql_text = text(sql_query).execution_options(autocommit=True)

#do
with engine.connect() as con:
    result = con.execute(sql_text)`
Cazforshort commented 3 years ago

I'm just trying to read from a sql database, but if I have this at the top of my node it wont run. Even if I don't use it at all. from sqlalchemy import create_engine, text

Cazforshort commented 3 years ago

Okay, problem solved. just needed to add sqlalchemy to the requirements.in. Everything works now. That was probably the biggest issue, just to forever to figure out what I had missed.

jmbenedetto commented 3 years ago

Hi. I still have the same problem when trying to follow the space ship tutorial. When running kedro run --node=preprocess_companies_node, I get an error message saying it cannot find defaut pipeline. I'm using kedro 0.17.2 with Python 3.8.8 on GCP AI notebook. Attached, the error log. kedro_run_node.txt

datajoely commented 3 years ago

@jmbenedetto - there is an issue with 0.17.2 where a ModuleNotFound error is being erroneously swallowed. To resolve this, please run kedro install from the terminal to ensure that your dependencies are up to date and try again.

The fix will be in 0.17.3 which will be released shortly.

jmbenedetto commented 3 years ago

@datajoely - I tried after running 'kedro install' but nothing changes. Same error. Is there a workaround? Thanks for your work!

Cazforshort commented 3 years ago

You probably need to add something to your requirements.in file. Start commenting out import statements (and code that uses the imported things). Keep commenting until you figure out the problem. Or just check imports against the requirements.in file.

On Wed, Mar 31, 2021 at 6:33 PM jm_benedetto @.***> wrote:

@datajoely https://github.com/datajoely - I tried after running 'kedro install' but nothing changes. Same error. Is there a workaround? Thanks for your work!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/quantumblacklabs/kedro/issues/731#issuecomment-811509538, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUZFLZUAKFZUSYQTMNFHM3TGOPMXANCNFSM4ZTT7OFQ .

lorenabalan commented 3 years ago

Hi everyone, error messages should be clearer with 0.17.3, released today. As always, feel free to raise a new issue if you encounter problems. Thanks a lot for your patience!

FranciscoReveriano commented 3 years ago

I am having the same problem as:

You probably need to add something to your requirements.in file. Start commenting out import statements (and code that uses the imported things). Keep commenting until you figure out the problem. Or just check imports against the requirements.in file. On Wed, Mar 31, 2021 at 6:33 PM jm_benedetto @.***> wrote: @datajoely https://github.com/datajoely - I tried after running 'kedro install' but nothing changes. Same error. Is there a workaround? Thanks for your work! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#731 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUZFLZUAKFZUSYQTMNFHM3TGOPMXANCNFSM4ZTT7OFQ .

Everything is updated. Just the "pipeline registry" function appears to be used/read.

datajoely commented 3 years ago

Hi @FranciscoReveriano are you having this problem on 0.17.2?