kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.5k stars 1.58k forks source link

[frontend] Clone a Run Missing Pipeline Reference #10560

Open boarder7395 opened 3 months ago

boarder7395 commented 3 months ago

Environment

Steps to reproduce

Clone a pipeline run that was uploaded with kfp 1.8.1 or earlier.

Expected result

Pipeline launch page to load.

Materials and Reference

Pipeline Run where Clone a Run is selected. image Clone a Run loading image Parameter selection Page. (Cannot proceed passed this page due to message "A pipeline must be selected". image image

Pipeline Definition. Note: When uploading in the new UI a run of this pipeline can be cloned as expected.


apiVersion: argoproj.io/v1alpha1
metadata:
  generateName: end-to-end-of-misty-pipeline-
  creationTimestamp: null
  labels:
    pipelines.kubeflow.org/kfp_sdk_version: 1.8.13
  annotations:
    pipelines.kubeflow.org/kfp_sdk_version: 1.8.13
    pipelines.kubeflow.org/pipeline_compilation_time: '2023-09-12T16:26:50.534395'
    pipelines.kubeflow.org/pipeline_spec: >-
      {"description": "Runs the preprocessing of the Misty dataset and then
      trains a model.", "inputs": [{"default":
      "s3://athn-ai-ds-mltp-misty-dev/datasets/enron", "name": "s3path",
      "optional": true, "type": "String"}, {"default":
      "Redacted",
      "name": "teams_webhook_url", "optional": true, "type": "String"}], "name":
      "End to End of Misty Pipeline"}
spec:
  templates:
    - name: end-to-end-of-misty-pipeline
      inputs:
        parameters:
          - name: s3path
      outputs: {}
      metadata: {}
      dag:
        tasks:
          - name: sparkpreprocess
            template: sparkpreprocess
            arguments:
              parameters:
                - name: s3path
                  value: '{{inputs.parameters.s3path}}'
          - name: train
            template: train
            arguments:
              parameters:
                - name: sparkpreprocess-tfrecord
                  value: >-
                    {{tasks.sparkpreprocess.outputs.parameters.sparkpreprocess-tfrecord}}
            dependencies:
              - sparkpreprocess
    - name: sparkpreprocess
      inputs:
        parameters:
          - name: s3path
      outputs:
        parameters:
          - name: sparkpreprocess-tfrecord
            valueFrom:
              path: /tmp/outputs/tfrecord/data
        artifacts:
          - name: sparkpreprocess-labels
            path: /tmp/outputs/labels/data
          - name: sparkpreprocess-tfrecord
            path: /tmp/outputs/tfrecord/data
          - name: sparkpreprocess-vocab
            path: /tmp/outputs/vocab/data
      nodeSelector:
        karpenter.sh/capacity-type: on-demand
        node.kubernetes.io/instance-type: m5.2xlarge
        workload: pipeline
      metadata:
        annotations:
          pipelines.kubeflow.org/arguments.parameters: >-
            {"Raw Data": "{{inputs.parameters.s3path}}", "Worker Instance":
            "m5.2xlarge", "Workers": "5"}
          pipelines.kubeflow.org/component_ref: >-
            {"digest":
            "cdab2b8e0a784f4490fc756f33382efc4c17dc946d85dc0ca909a8663a260996"}
          pipelines.kubeflow.org/component_spec: >-
            {"description": "Run the preprocessing using spark of Enron
            dataset.", "implementation": {"container": {"args": ["--input",
            {"inputValue": "Raw Data"}, "--workers", {"inputValue": "Workers"},
            "--worker-instance", {"inputValue": "Worker Instance"},
            "--spark-image",
            "473391520281.dkr.ecr.us-east-1.amazonaws.com/misty/master/spark_preprocess:6024211",
            "--tf-output", {"outputPath": "tfrecord"}, "--vocab-output",
            {"outputPath": "vocab"}, "--label-output", {"outputPath":
            "labels"}], "command": ["run_preprocess"], "image":
            "473391520281.dkr.ecr.us-east-1.amazonaws.com/misty/master/spark_preprocess:6024211"}},
            "inputs": [{"description": "Path where raw data is stored.", "name":
            "Raw Data", "type": "String"}, {"description": "The number of
            workers to use for this run.", "name": "Workers", "type":
            "Integer"}, {"description": "The instance type used for workers.",
            "name": "Worker Instance", "type": "String"}], "name":
            "SparkPreprocess", "outputs": [{"description": "The Local Path to
            output preprocessed data.", "name": "tfrecord", "type": "String"},
            {"description": "The Local Path to output preprocessed data.",
            "name": "vocab", "type": "String"}, {"description": "The Local Path
            to output preprocessed data.", "name": "labels", "type": "String"}]}
          pipelines.kubeflow.org/max_cache_staleness: P27D
          pipelines.kubeflow.org/task_display_name: Preprocess Dataset.
        labels:
          clean: 'true'
          pipelines.kubeflow.org/enable_caching: 'true'
          pipelines.kubeflow.org/kfp_sdk_version: 1.8.13
          pipelines.kubeflow.org/pipeline-sdk-type: kfp
      container:
        name: ''
        image: >-
          473391520281.dkr.ecr.us-east-1.amazonaws.com/misty/master/spark_preprocess:6024211
        command:
          - run_preprocess
        args:
          - '--input'
          - '{{inputs.parameters.s3path}}'
          - '--workers'
          - '5'
          - '--worker-instance'
          - m5.2xlarge
          - '--spark-image'
          - >-
            473391520281.dkr.ecr.us-east-1.amazonaws.com/misty/master/spark_preprocess:6024211
          - '--tf-output'
          - /tmp/outputs/tfrecord/data
          - '--vocab-output'
          - /tmp/outputs/vocab/data
          - '--label-output'
          - /tmp/outputs/labels/data
        env:
          - name: PODNAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: PODID
            valueFrom:
              fieldRef:
                fieldPath: metadata.uid
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: WORKFLOWNAME
            valueFrom:
              fieldRef:
                fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'
          - name: WORKFLOWID
            valueFrom:
              fieldRef:
                fieldPath: 'metadata.labels[''pipeline/runid'']'
          - name: PROJECT_NAME
            value: misty
          - name: AWS_DEFAULT_REGION
            value: us-east-1
          - name: DISTRIBUTED
            value: 'true'
        resources:
          requests:
            cpu: 7510m
            memory: 26536345Ki
      tolerations:
        - key: workload
          operator: Equal
          value: pipeline
          effect: NoSchedule
    - name: train
      inputs:
        parameters:
          - name: sparkpreprocess-tfrecord
      outputs:
        artifacts:
          - name: train-Model-dir
            path: /tmp/outputs/Model_dir/data
          - name: train-Tensorboard-dir
            path: /tmp/outputs/Tensorboard_dir/data
      nodeSelector:
        karpenter.sh/capacity-type: on-demand
        node.kubernetes.io/instance-type: p3.2xlarge
        workload: pipeline
      metadata:
        annotations:
          pipelines.kubeflow.org/arguments.parameters: >-
            {"Dataset": "{{inputs.parameters.sparkpreprocess-tfrecord}}",
            "Epochs": "1", "Learning Rate": "0.001", "Port": "8080"}
          pipelines.kubeflow.org/component_ref: >-
            {"digest":
            "7290e0ae7fbbd7cad94be2ce597c6bef79e380e873a81a5ce76904b671750400"}
          pipelines.kubeflow.org/component_spec: >-
            {"description": "Run the training of Enron Model.",
            "implementation": {"container": {"args": ["--input", {"inputValue":
            "Dataset"}, "--epochs", {"inputValue": "Epochs"}, "--learning-rate",
            {"inputValue": "Learning Rate"}, "--model-dir", {"outputPath":
            "Model dir"}, "--tensorboard-dir", {"outputPath": "Tensorboard
            dir"}, "--port", {"inputValue": "Port"}], "command": ["run_train"],
            "image":
            "473391520281.dkr.ecr.us-east-1.amazonaws.com/misty/master/train:6024211"}},
            "inputs": [{"description": "Path where raw data is stored.", "name":
            "Dataset", "type": "String"}, {"description": "Number of epochs to
            train for.", "name": "Epochs", "type": "Integer"}, {"description":
            "Learning rate to use while training.", "name": "Learning Rate",
            "type": "Float"}, {"description": "The port to start tensorboard
            on.", "name": "Port", "type": "Integer"}], "name": "Train",
            "outputs": [{"description": "Path to save the model.", "name":
            "Model dir", "type": "String"}, {"description": "Path to save the
            tensorboard logs to.", "name": "Tensorboard dir", "type":
            "String"}]}
          pipelines.kubeflow.org/max_cache_staleness: P27D
          pipelines.kubeflow.org/task_display_name: Train Model.
        labels:
          clean: 'true'
          pipelines.kubeflow.org/enable_caching: 'true'
          pipelines.kubeflow.org/kfp_sdk_version: 1.8.13
          pipelines.kubeflow.org/pipeline-sdk-type: kfp
      container:
        name: ''
        image: >-
          473391520281.dkr.ecr.us-east-1.amazonaws.com/misty/master/train:6024211
        command:
          - run_train
        args:
          - '--input'
          - '{{inputs.parameters.sparkpreprocess-tfrecord}}'
          - '--epochs'
          - '1'
          - '--learning-rate'
          - '0.001'
          - '--model-dir'
          - /tmp/outputs/Model_dir/data
          - '--tensorboard-dir'
          - /tmp/outputs/Tensorboard_dir/data
          - '--port'
          - '8080'
        ports:
          - hostPort: 8080
            containerPort: 8080
        env:
          - name: PODNAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: PODID
            valueFrom:
              fieldRef:
                fieldPath: metadata.uid
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: WORKFLOWNAME
            valueFrom:
              fieldRef:
                fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'
          - name: WORKFLOWID
            valueFrom:
              fieldRef:
                fieldPath: 'metadata.labels[''pipeline/runid'']'
          - name: PROJECT_NAME
            value: misty
          - name: AWS_DEFAULT_REGION
            value: us-east-1
        resources:
          limits:
            nvidia.com/gpu: '1'
          requests:
            cpu: 7410m
            memory: 50863308Ki
      tolerations:
        - key: workload
          operator: Equal
          value: pipeline
          effect: NoSchedule
        - key: processing
          operator: Equal
          value: gpu
          effect: NoSchedule
  entrypoint: end-to-end-of-misty-pipeline
  arguments:
    parameters:
      - name: s3path
        value: 's3://athn-ai-ds-mltp-misty-dev/datasets/enron'
      - name: teams_webhook_url
        value: >-
          Redacted
  serviceAccountName: pipeline-runner
status:
  startedAt: null
  finishedAt: null```
<!-- Don't delete message below to encourage users to support your issue! -->
Impacted by this bug? Give it a 👍. 
boarder7395 commented 3 months ago

Additional information: In the 3rd image above the entire pipeline/pipeline version section is missing. Also when I click on view pipeline it does pull up the correct pipeline.

boarder7395 commented 3 months ago

Upon additional research I've found some pipelines are not rendering in the UI. This might be related. image

One thing that stands out is the templates endpoint returns an empty dict. https://anml-6305tk-kf.ml.ai.athena.io/pipeline/apis/v1beta1/pipeline_versions/1e589874-dc98-486d-b2f6-876427eb739f/templates

I'll continue to research the issue to see if I can find the cause.

boarder7395 commented 2 months ago

Figured out the issue; my pipeline_runs table has empty values for pipelineid and pipelineversionid. Figuring out these values and adding them resolves the issue.

rimolive commented 2 months ago

@boarder7395 So the workaround to that issue is manually change database data?

boarder7395 commented 2 months ago

@rimolive yeah I had to update the run_details table using the resource_reference table and pipeline versions. Specifically the PipelineId, and PipelineVersionId values. Once I did that the clone from pipeline run functionality worked.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rimolive commented 1 week ago

/lifecycle frozen