kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.5k stars 1.57k forks source link

[backend] Unable to disable cache for pipeline steps #10966

Open rmoesbergen opened 4 days ago

rmoesbergen commented 4 days ago

Environment

Steps to reproduce

I have a pipeline like this:

        train_op = (
            train_loader.create_op(
                job_name=job_name,
                account=account,
            )
            .set_caching_options(False)
        )
        train_op.execution_options.caching_strategy.max_cache_staleness = "P0D"

When compiling this pipeline, the Pod still gets these annotations

labels:
    app: kubeflow-job
    pipeline/runid: d2173b95-f465-47ac-a38a-470769c2064b
    pipelines.kubeflow.org/cache_enabled: 'true'
    pipelines.kubeflow.org/cache_id: ''
    pipelines.kubeflow.org/enable_caching: 'false'
  annotations:
    pipelines.kubeflow.org/execution_cache_key: 852f1ec5f95c01d9c0e62b85072fa8092f5f7933e73a08ec96e6ebb74229391e

and no matter what I try, kubeflow keeps caching the steps which makes no sense since our underlying data changes, but the parameters are the same. Also tried all of the suggestions here, including modifying the mutating admission webhook, but nothing works:

https://github.com/kubeflow/pipelines/issues/4857 https://www.kubeflow.org/docs/components/pipelines/v1/overview/caching/ https://www.kubeflow.org/docs/components/pipelines/v2/caching/

The only thing that sort-of works is reverting kubeflow pipelines back to 2.0.5. The annotations are then still there, but somehow kubeflow ignores them and doesn't cache with that version.

Expected result

Kubeflow stops caching when I ask it to.


Impacted by this bug? Give it a 👍.