Open JenspederM opened 2 months ago
Hi @JenspederM, thanks for flagging this issue. Can I ask what your use case is for printing the result of find_pipelines()
?
This method has been added to enable auto discovery of pipelines and does some stuff in the back to make sure your project and its modules are discoverable (https://docs.kedro.org/en/stable/nodes_and_pipelines/pipeline_registry.html). It's meant to run as part of a "regular" Kedro flow where it's preceded by certain project setup methods. You can fix your script by calling bootstrap_project()
before find_pipelines()
(https://docs.kedro.org/en/stable/kedro_project_setup/session.html#bootstrap-project-and-configure-project). However, I would only recommend doing that for exploration and not if you're planning to run that code in production.
Let me know if this makes sense!
Hi @merelcht,
Thank you for your reply.
I am using find_pipelines()
to generate databricks assets bundle resources. I am working on a template for asset bundles that uses Kedro for defining pipelines and dependencies and databricks workflows for scheduling. You can find the project here
Thanks for the suggesting bootstrap_project()
. For now, I have been using configure_project(<package-name>)
as used in databricks_run.py
in the databricks-iris
starter.
You can see my exact usage right here
@merelcht
I have been thinking of making a cookiecutter for Kedro as well. Do you think there would be any interest in this?
I made the template based on my own experience of running large scale Databricks projects in production with many contributors of varying levels of experience.
I'd say, regardless of use case, raising an UnboundLocalError
from internal code should not happen, but a more informative error instead.
I have been thinking of making a cookiecutter for Kedro as well. Do you think there would be any interest in this?
Of course! When you get to do it, we can promote it on https://github.com/kedro-org/awesome-kedro
Also consider exploring https://github.com/copier-org/copier/, a modern alternative to cookiecutter
The only problem that I haven't really found a solution for is how I would get the workspace host from the users' Databricks config without using the Databricks CLI.
I'd say, regardless of use case, raising an
UnboundLocalError
from internal code should not happen, but a more informative error instead.
@astrojuanlu I also looked into the UnboundLocalError
, and I see that it could be resolved by adding asserts or running validate_settings()
in find_pipelines()
and ParallelRunner._run()
.
Or does it deserve a greater redesign?
IMO global variables can be quite dangerous when used like this, so I would probably advice for redesigning this logic to remove the use of globals.
Moving this to our Inbox so that we can look at it and it doesn't get lost.
IMO global variables can be quite dangerous when used like this, so I would probably advice for redesigning this logic to remove the use of globals.
For the record, I agree
Description
Error is thrown when trying to print
find_pipelines
from thekedro.framework.project
module.Context
Unable to use find_pipelines
Steps to Reproduce
print(find_pipelines())
to the bottom of thepipeline_regitry.py
filepython ./src/<project>/pipeline_regitry.py
Expected Result
A dict of pipelines.
Actual Result
I get the following error:
Your Environment
pip show kedro
orkedro -V
): kedro, version 0.19.5python -V
): Python 3.12.2 using rye as package manager