kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[backend] Argo Workflows created for ScheduledWorkflows are missing common metadata #7274

Closed jmendesky closed 4 months ago

jmendesky commented 2 years ago

Environment

Kustomize

1.3.0 - but the same behaviour is present in current versions

1.8.10

Steps to reproduce

Example created as a single Pipeline run:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  annotations:
>   pipelines.kubeflow.org/kfp_sdk_version: 1.8.10
>   pipelines.kubeflow.org/pipeline_compilation_time: '2022-02-01T15:41:19.456019'
>   pipelines.kubeflow.org/pipeline_spec: '{"description": "Constructs a Kubeflow
      pipeline.", "inputs": [{"name": "pipeline-root"}], "name": "training-pipeline"}'}
  labels:
>   pipelines.kubeflow.org/kfp_sdk_version: 1.8.10
    workflows.argoproj.io/completed: "true"
    workflows.argoproj.io/phase: Succeeded
    pipeline/persistedFinalState: "true"
    pipeline/runid: 96455954-6f96-4933-9ee1-a85cd676b8c6
...

Example created by a recurring run:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  annotations: {}
  labels:
    pipeline/persistedFinalState: "true"
    pipeline/runid: bd44f525-7576-4ca8-ae82-b0eb050d1ae9
    scheduledworkflows.kubeflow.org/isOwnedByScheduledWorkflow: "true"
    scheduledworkflows.kubeflow.org/scheduledWorkflowName: training-run-configurr8fjn
    scheduledworkflows.kubeflow.org/workflowEpoch: "1644166800"
    scheduledworkflows.kubeflow.org/workflowIndex: "2"
    workflows.argoproj.io/completed: "true"
    workflows.argoproj.io/phase: Succeeded
...

You can see that the workflow for the recurring run has an entirely new set of labels and no annotations. Specifically, the compiled argo workflow contains pipelines.kubeflow.org/* fields which get removed for the scheduled run.

Expected result

Both workflows should have the same common metadata - most importantly labels and annotations. We use these fields for automation after a pipeline has finshed.

Materials and Reference

After some investigation I found out that the ScheduledWorkflow CRD doesn't contain a metadata field: https://github.com/kubeflow/pipelines/blob/master/backend/src/apiserver/template/argo_template.go#L92 and https://github.com/kubeflow/pipelines/blob/master/backend/src/crd/pkg/apis/scheduledworkflow/v1beta1/types.go#L48

and that the original Workflow's Spec gets copied without its metadata: https://github.com/kubeflow/pipelines/blob/master/backend/src/crd/controller/scheduledworkflow/util/scheduled_workflow.go#L164


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

zijianjoy commented 2 years ago

/assign @ji-yaqi

ji-yaqi commented 2 years ago

Hi @jmendesky, is there any impact of this missing metadata that we should be aware of?

jmendesky commented 2 years ago

Hi @jmendesky, is there any impact of this missing metadata that we should be aware of?

@ji-yaqi we use this metadata in our automation which reacts to finished pipeline runs. This automation is currently broken. In general, I think this inconsistency can lead to more potential problems to more users if they rely on this metadata being present.

jmendesky commented 2 years ago

Is there any update on this?

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 4 months ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.