kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[feature] Deduplicate component templates #7242

Closed jli closed 4 months ago

jli commented 2 years ago

Feature Area

/area sdk

What feature would you like to see?

I would like the KFP compiler to reuse templates when calling the same component multiple times. In other words, each component only appears as a template once in the spec.templates pipeline yaml spec, and called with different input parameters each time they're used.

Currently, each time a component is used in a pipeline, the compiler emits a new copy of the component definition in the pipeline YAML spec.

I tried decorating my component functions with @kfp.dsl.graph_component, and refactored things so that each component function only takes PipelineParam inputs, but it seems like it didn't work: I still have multiple copies of each component. Perhaps I'm just using graph_component incorrectly?

What is the use case or pain point?

An important use case for my team is to run a single pipeline that trains/scores/QCs multiple models and then runs a reporting step comparing the results from each model.

The size of the pipeline YAML spec scales linearly with the number of models we include in the pipeline. This is preventing us from comparing all the models we would like to. (Related: #4170)

Is there a workaround currently?

Not that I know of. This is blocking us from running pipelines as big as we'd like. We are already using the workaround suggested here to shrink the yaml output: https://github.com/kubeflow/pipelines/issues/4170#issuecomment-655764762

(I suppose we could try to deduplicate the generated pipeline yaml... but that seems quite complex.)


Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

jli commented 2 years ago

hm, actually, I think this is a dupe of this older issue which got auto-closed https://github.com/kubeflow/pipelines/issues/4272

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 4 months ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.