opendatahub-io / data-science-pipelines-tekton

Kubeflow Pipelines on Tekton
https://developer.ibm.com/blogs/kubeflow-pipelines-with-tekton-and-watson/
Apache License 2.0
0 stars 19 forks source link

[Bug]: Manually triggered pipeline runs do not include the latest pipeline version #164

Closed mamurak closed 1 year ago

mamurak commented 1 year ago

Is there an existing issue for this?

Deploy type

ODH Dashboard UI

Version

RHODS 1.33.0

Environment

Current Behavior

I uploaded multiple versions of the same pipeline to Data Science Pipelines using the KFP SDK. When selecting the pipeline in the Pipelines tab of the RHODS dashboard, the latest version is displayed. However, when manually triggering a new run of this pipeline, an old version of the pipeline is executed instead of the current one.

Expected Behavior

When manually triggering a new run of a pipeline, I expect the latest registered pipeline version to be executed.

Steps To Reproduce

  1. Create a new pipeline through the RHODS dashboard, e.g. by uploading the manifest attached below.
  2. Create a new version of the same pipeline with an obvious change, e.g. by renaming the pipeline nodes.
  3. Upload the new version to Data Science Pipelines, e.g. using the upload_pipeline_version method of the KFP SDK's Client.
  4. Navigate to the Pipelines tab in the RHODS dashboard and select the uploaded pipeline, e.g. visualize-metrics.
  5. Trigger a new pipeline run and observe the pipeline run.
  6. The initial version of the pipeline is executed, e.g. with the initial node names.

Workaround (if any)

None.

Anything else

Sample pipeline:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: visualize-metrics
  annotations:
    tekton.dev/output_artifacts: '{"run-a-file": [{"key": "artifacts/$PIPELINERUN/run-a-file/mlpipeline-metrics.tgz",
      "name": "mlpipeline-metrics", "path": "/tmp/mlpipeline-metrics.json"}, {"key":
      "artifacts/$PIPELINERUN/run-a-file/mlpipeline-ui-metadata.tgz", "name": "mlpipeline-ui-metadata",
      "path": "/tmp/mlpipeline-ui-metadata.json"}]}'
    tekton.dev/input_artifacts: '{}'
    tekton.dev/artifact_bucket: mlpipeline
    tekton.dev/artifact_endpoint: minio-service.kubeflow:9000
    tekton.dev/artifact_endpoint_scheme: http://
    tekton.dev/artifact_items: '{"run-a-file": [["mlpipeline-metrics", "/tmp/mlpipeline-metrics.json"],
      ["mlpipeline-ui-metadata", "/tmp/mlpipeline-ui-metadata.json"]]}'
    sidecar.istio.io/inject: "false"
    tekton.dev/template: ''
    pipelines.kubeflow.org/big_data_passing_format: $(workspaces.$TASK_NAME.path)/artifacts/$ORIG_PR_NAME/$TASKRUN_NAME/$TASK_PARAM_NAME
    pipelines.kubeflow.org/pipeline_spec: '{"description": "This pipeline illustrates
      how to generate and visualize metrics in Kubeflow Pipelines.", "name": "visualize_metrics"}'
  labels:
    pipelines.kubeflow.org/pipelinename: ''
    pipelines.kubeflow.org/generation: ''
spec:
  pipelineSpec:
    tasks:
    - name: run-a-file
      taskSpec:
        steps:
        - name: main
          args:
          - |
            sh -c "mkdir -p ./jupyter-work-dir && cd ./jupyter-work-dir"
            sh -c "echo 'Downloading https://raw.githubusercontent.com/elyra-ai/elyra/v3.15.0/elyra/kfp/bootstrapper.py' && curl --fail -H 'Cache-Control: no-cache' -L https://raw.githubusercontent.com/elyra-ai/elyra/v3.15.0/elyra/kfp/bootstrapper.py --output bootstrapper.py"
            sh -c "echo 'Downloading https://raw.githubusercontent.com/elyra-ai/elyra/v3.15.0/etc/generic/requirements-elyra.txt' && curl --fail -H 'Cache-Control: no-cache' -L https://raw.githubusercontent.com/elyra-ai/elyra/v3.15.0/etc/generic/requirements-elyra.txt --output requirements-elyra.txt"
            sh -c "python3 -m pip install  packaging && python3 -m pip freeze > requirements-current.txt && python3 bootstrapper.py --pipeline-name 'visualize_metrics' --cos-endpoint 'http://minio-service.minio.svc:9000' --cos-bucket 'user11-pipelines' --cos-directory 'visualize_metrics-1019132923' --cos-dependencies-archive 'metrics-d3665f88-7fb5-411f-bbed-8455a11b608a.tar.gz' --file 'os-mlops/notebooks/kfp-visualization-example/metrics.ipynb' "
          command:
          - sh
          - -c
          env:
          - name: AWS_ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                key: AWS_ACCESS_KEY_ID
                name: aws-connection-pipelines
          - name: AWS_SECRET_ACCESS_KEY
            valueFrom:
              secretKeyRef:
                key: AWS_SECRET_ACCESS_KEY
                name: aws-connection-pipelines
          - name: ELYRA_RUNTIME_ENV
            value: kfp
          - name: ELYRA_ENABLE_PIPELINE_INFO
            value: "True"
          - name: ELYRA_WRITABLE_CONTAINER_DIR
            value: /tmp
          - name: ELYRA_RUN_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.annotations['pipelines.kubeflow.org/run_name']
          image: quay.io/mmurakam/elyra-kfp-runtime-base:elyra-kfp-runtime-base-v0.2.0
        stepTemplate:
          volumeMounts:
          - name: mlpipeline-metrics
            mountPath: /tmp
        volumes:
        - name: mlpipeline-metrics
          emptyDir: {}
        metadata:
          labels:
            elyra/node-type: notebook-script
            elyra/pipeline-name: visualize_metrics
            elyra/pipeline-version: ''
            elyra/experiment-name: ''
            elyra/node-name: metrics
            pipelines.kubeflow.org/cache_enabled: "true"
          annotations:
            elyra/node-user-doc: This notebook produces metadata that is visualized
              in the Kubeflow Pipelines Central Dashboard.
            elyra/node-file-name: os-mlops/notebooks/kfp-visualization-example/metrics.ipynb
            elyra/pipeline-source: visualize_metrics.pipeline
            pipelines.kubeflow.org/task_display_name: metrics
            pipelines.kubeflow.org/component_spec_digest: '{"name": "Run a file",
              "outputs": [], "version": "Run a file@sha256=f4bdf608805fc05c6e81a023f1a2c317978a634fa701a9b6c7f9a7cd0419d1c3"}'
gregsheremeta commented 1 year ago

moving priority to blocker per Myriam's request

DharmitD commented 1 year ago

Replicated this scenario on a RHODS 1.33 installed cluster by running this KFP SDK script locally:

"""Test showing a basic connection to kfp server."""
import os

from dotenv import load_dotenv

import kfp_tekton

kubeflow_endpoint = <insert-your-DSPA-route-endpoint-here>
bearer_token = <insert-your-cluster-token-here>
pipeline_file_path="sample-pipeline.yaml"
pipeline_version_file_path="new-sample-pipeline.yaml"
pipeline_version_name = "version-2"

if __name__ == "__main__":
    client = kfp_tekton.TektonClient(
        host=kubeflow_endpoint,
        existing_token=bearer_token,
    )
    print(client.list_experiments())
    pipeline_file = os.path.join(pipeline_file_path)
    pipeline = client.pipeline_uploads.upload_pipeline(pipeline_file, name="version-test")
    pipeline_version_file = os.path.join(pipeline_version_file_path)
    pipeline_version = client.pipeline_uploads.upload_pipeline_version(pipeline_version_file,
                                                                       name=pipeline_version_name,
                                                                       pipelineid=pipeline.id)`

Here the sample-pipeline.yaml is the same pipeline spec specified in the description above. For new-pipeline.yaml we just created a new version of the sample-pipeline.yaml pipeline spec:

On running the SDK script we had a pipeline created with two versions. The pipeline spec points to version 2, when we create a run the spec for the run still points to version 1 spec. That confirms that the UI renders the new version for the pipeline but run still uses the old version.

On further investigation, we suspect that the /apis/vibeta1/runs API call might not be picking the latest pipeline version. The apiResourceReference block could be used here to pass the latest version. It would look something like:

"version": "latest-version-ID"

PFA the request payload image,

image (2)

if you notice resource_references - that's an array. Element 0 contains an object, and it specifies a type of PIPELINE_VERSION and then a uuid. Fix would be to grab the correct uuid. This seems related.

cc: @gregsheremeta

DharmitD commented 1 year ago

Closing this issue, created https://github.com/opendatahub-io/odh-dashboard/issues/2014 on the ODH-dashboard repo to cover next steps for this issue.