kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

How can i make a graph dependent on an CantainerOP, another graph or some Conditions? #4202

Closed pfeiking closed 4 years ago

pfeiking commented 4 years ago

I used some tedious strategies to solve the above problems, but I wonder how you solved such problems.

Bobgy commented 4 years ago

/assign @Ark-kun

Ark-kun commented 4 years ago

@wangpengfei984597422 Can you please describe your scenario?

What graph are you talking about? The @kfp.dsl.graph_component decorator or kfp.components.create_graph_component_from_pipeline_func?

How can i make a graph dependent on an

CantainerOP

Just pass some output of ContainerOp instance to some graph input.

another graph

Just pass some output of another graph task to some graph input.

some Conditions

I think you can use with kfp.dsl.Condition(...): context manager the same way as usual.

P.S.

CantainerOP

We advice our users to use reusable components (e.g. component.yaml files) instead of constructing ContainerOp objects directly. Please check the following tutorial: [106 - Creating components from command-line programs](https://github.com/Ark-kun/kfp_samples/blob/ae1a5b6 /2019-10%20Kubeflow%20sumit/106%20-%20Creating%20components%20from%20command-line%20programs/106%20-%20Creating%20components%20from%20command-line%20programs.ipynb) and https://www.kubeflow.org/docs/pipelines/sdk/component-development/

pfeiking commented 4 years ago

1、When I was building a pipeline, one of the components Component_A used to modify my data in the database(like google Bigtable), and I use kfp.components.create_graph_component_from_pipeline_funcbuilt a graph(Component_A is not in this graph), there is no argument passing between graph and Component_A, but this graph must wait before Component_A has been completed. The graph is not like ContainerOP has a function ".after()", I have to add irrelevant parameters between them to limit their dependencies. So I want to ask if graph has a similar way of building dependencies with component. 2、In my workflow, I created 3 fp.dsl.Condition(…) that have no relationship whatsoever, after this 3 fp.dsl.Condition(…), I want to connect a graph(this graph have to wait for all three fp.dsl.Condition(…) has been completed to execute), I have no way to use parameter passing, because, graph incoming parameters are fixed while ContainerOP in kfp.dsl.Condition(…) may not be executed, so there is no output. In this case, I have to manually modify the yaml generated by kfp.compiler.Compiler().compile. So I would like to ask if there is a good way to solve the above problem through the SDK. Example:

 with kfp.dsl.Condition(param_check_task.outputs['mark_ts_oss_path'] == 'have oss_paths'):
        o2o_ts_task = o2o_ts_op(...) # use kfp.components.load_component_from_file load from yaml

    with kfp.dsl.Condition(param_check_task.outputs['mark_rts_oss_path'] == 'have oss_paths'):
        o2o_rts_task = o2o_rts_op(...)

    with kfp.dsl.Condition(param_check_task.outputs['mark_rm_oss_path'] == 'have oss_paths'):
        o2o_rm_task = o2o_rm_op(...)

    odps_merge_task = odps_merge_graph(
        training_param=param_check_task.outputs[...]
    )

In this example, odps_merge_graph and 3 Condition do not pass any arguments, but odps_merge_graph must wait for 3 Condition to complete, i have to manually modify yaml, like this: merge-graph-rm is a component(ContainerOP) in odps_merge_graph

 - name: merge-graph-rm
        template: merge-graph-rm
        dependencies: [condition-1, condition-2, condition-3, training-param-check]
        arguments:
          parameters:
          - {name: , value: '{{}}'}
          artifacts:
          - {name: , from: '{{}}'}
          ......

So is there a better way to do this than by modifying yaml?

Ark-kun commented 4 years ago

Thank you for the detailed description. At this moment graph components do not allow specifying any extra dependencies. But we've recorded your feature request.

There are some workarounds for the issue and usually this make you components and pipelines better.

The best components are like pure functions. Its outputs only depend on the inputs and there are no side effects. Data comes in, data is processed, the results go out. For example model trainer components are usually pure - the training data and training parameters go in and the model comes out. It's much easier when all components behave like this.

there is no argument passing between graph and Component_A, but this graph must wait before Component_A has been completed. I have to add irrelevant parameters between them to limit their dependencies. So I want to ask if graph has a similar way of building dependencies with component.

Ideally you should establish the dependencies using data passing. A component that just checks something can "pass-through" the argument it checks. That output can be passed to the next component establishing a dependency. A component that modifies data in-place (which is problematic) can, for example, output the name of the DB. Then you can pass that name to the graph component and inside the graph use the DB name to read from it. A component that adds row to a DB can return the resulting row ID and the DB name.

I have no way to use parameter passing, because, graph incoming parameters are fixed

Usually if there is any kind of dependency between tasks it can still be expressed as data dependency. For example, the first component prepares the data in some bucket and the second component reads that data. In this case the first component can pass the location f the prepared data to the second one.

If the graph needs all 3 tasks to succeed, then it usually needs something that they produce. This data can be passed explicitly, creating a data dependency.

Another workaround in this case is to add a proxy task between the 3 conditions and the graph. That dummy task should just have .after(......) and produce a single output that's passed to the graph input.

haibingzhao commented 4 years ago

I have no way to use parameter passing, because, graph incoming parameters are fixed while ContainerOP in kfp.dsl.Condition(…) may not be executed

I think this is a bug of Argo, detail refer: https://github.com/argoproj/argo/issues/3491

by the way, kfp sdk dose not support optional currently. any plan to support this?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.