Open Techn0logic opened 2 years ago
I figured that it breaks if the variable names of outputs from each step overlap.
Changing variable x to x1; y to y1 names allows dag to succeed.
@pipeline(name="binary-classification", experiment="kale-tutorial")
def ml_pipeline(rs=42, iters=100):
x, y = load(rs)
x1, x_test, y1, y_test = split(x, y)
train(x1, x_test, y1, iters)
Is this the expected behavior in the SDK? Perhaps the example needs to be updated on the documentation?
This is caused by the way pipeline steps are added to DAG in pyprocessor.py:
def _link_step(self, step: Step):
ins_left = set(step.ins.copy())
ins_left.difference_update(set(self.pipeline.pipeline_parameters))
for anc_step in reversed(list(self.pipeline.steps)):
if ins_left.intersection(set(anc_step.outs)):
self.pipeline.add_dependency(anc_step, step)
ins_left.difference_update(set(anc_step.outs))
When _link_step
is called on a given step it already has been placed in the DAG but with no edges.
So when anc_step
becomes equal to step
it is linked to itself because: ins_left.intersection(set(anc_step.outs))
is true.
I believe at a bare minimum Runtime exception
should be raised in PythonProcessor._register_step_handler
if outputs and inputs of a step have non-empty intersection.
I can try to add modification which would allow such situation to occur.
I'm experiencing the first issue as well, did you manage to resolve it?
Aiming to use kale sdk to compile (and run) pipeline in on-prem kubeflow environment as per documentation https://docs.arrikto.com/release-1.4/user/kale/sdk/pipelines.html#procedure
example
kale_sdk.py
Issue 1. entry point doesn't seem to work as expected on the module level
python -m kale --help /opt/conda/bin/python: No module named kale.__main__; 'kale' is a package and cannot be directly executed
While using
kale
binary seems to only work with notebook NBIssue 2. docstring in the @step and @pipeline causes error. (Removing docstrings leads to another issue)
Issue 3: When removing docstrings, and having more than 1 step DAG creation fails.
Expectation is to compile and run pipeline using sdk defined in the python file as it is outlined in the documentation.
Would appreciate some pointers on how to correctly use the SDK and why I am experiencing these errors.