Open DharmitD opened 1 month ago
/assign @DharmitD
I see the pain here, but my org expects caching to be the default and requiring every component in a pipeline to enable it would be just as much of a pain as disabling it for each component. Alternative suggestion allow the default to be set at the pipeline level?
@dsl.pipeline(name='iris-training-pipeline', caching=False)
def my_pipeline():
task_1 = create_dataset()
task_2 = create_dataset()
task_3 = create_dataset()
task_3.set_caching_options(True)
Alternative suggestion allow the default to be set at the pipeline level?
That's a good suggestion, and I think some day we'll get to implementing that. Ref: #10839
my org expects caching to be the default and requiring every component in a pipeline to enable it would be just as much of a pain as disabling it for each component
Yep, we brought this issue up at the August 14, 2024 KFP Community Meeting (agenda, recording), and that was the consensus feeling there too. I suggested an additive change whereby we could set a CLI flag or env var to set the default to disabled, and the meeting attendees were in favor of that. Hence #11142 .
@DharmitD , per the last couple comments, can you edit the title of this issue?
[feature] Update DSL to have default set to caching disabled
-> [feature] allow setting a default of execution caching disabled via a compiler CLI flag and env var
Feature Area
/area backend /area sdk
What feature would you like to see?
Kubeflow Pipelines has a caching feature that allows users to avoid re-running pipeline components (steps in the pipeline) if the system detects that such a component has previously run and its outputs (artifacts) could be reused. The goal is to save time and computation.
By default, the KFP compiler defaults to setting caching enabled on every Component/Task unless the pipeline author calls
In other words:
Caching disabled is a much more reasonable default.
DSL Example
Caching is controlled on each individual pipeline Component / Task. Here is example KFP DSL code that disables caching for a single task:
Today, the KFP compiler defaults to setting caching enabled on every Component/Task unless the pipeline author calls
task.set_caching_options(False)
In other words:
When we are done with this feature, this will be true:
What is the use case or pain point?
We need to fix the KFP compiler to stop enabling caching by default (by setting
task.set_caching_options(True)
) if the user didn’t ask for that. As described above, the effect of this behavior is that everything tries to use the cache by default, even though caching is disabled by default in the backend.This might be a significant change, we wish to have a discussion with the KFP community, get consensus on this update and then proceed with making changes. Find a related issue here: https://github.com/kubeflow/pipelines/issues/10839
Love this idea? Give it a 👍.