kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

UX does not show the archived logs in the Logs tab even with ARGO_ARCHIVE_LOGS: true #3818

Closed Ark-kun closed 1 year ago

Ark-kun commented 4 years ago

Repro steps:

  1. Set 'ARGO_ARCHIVE_LOGS: "true"' for ml-pipeline-ui
  2. Run the preloaded Data passing pipeline and wait until completion.
  3. Delete one of the pods
  4. Select the corresponding node in the graph and switch to the Logs tab

The pod has archived logs that are successfully shown on the Inputs/Outputs tab.

Log URI: "main-logs minio://mlpipeline/artifacts/file-passing-pipelines-k6k9w/file-passing-pipelines-k6k9w-3191777755/main.log"

However when I navigate to the Logs tab I get the usual error "Warning: failed to retrieve pod logs. Possible reasons include cluster autoscaling or pod preemption" and when I click "Details" I see "Could not get main container logs: S3Error: The specified key does not exist."

I've tried to set ARGO_ARCHIVE_PREFIX: "", but got the same error.

Version: master.

eterna2 commented 4 years ago

The configmap for Argo has set archive to true also?

Did the run succeed? Because if the run did not succeed, the workflow status sometimes will not report the artifact location, and in this case the server cannot retrieve the logs unless u setup the default bucket location (there are a few more config where u can declare where the Argo artifacts are located).

eterna2 commented 4 years ago

But let me test on my machine and see, might be a regression.

Ark-kun commented 4 years ago

The configmap for Argo has set archive to true also?

Yes. (archiveLogs is set to true by default for the last several releases).

Did the run succeed? Because if the run did not succeed, the workflow status sometimes will not report the artifact location, and in this case the server cannot retrieve the logs unless u setup the default bucket location (there are a few more config where u can declare where the Argo artifacts are located).

Yes. As I posted, the logs are uploaded and I see them on the Inputs/Outputs tab: Log URI: "main-logs minio://mlpipeline/artifacts/file-passing-pipelines-k6k9w/file-passing-pipelines-k6k9w-3191777755/main.log".

BTW, your great artifact preview feature successfully previews the logs =)

eterna2 commented 4 years ago

This is what I found running the data passing pipeline.

  1. for ops that has a logs (e.g. Print Text), argo did not out the logs as artifacts - i.e. if u go to the UI and click on Input/Output, u will not see any main-log artifact

    time="2020-05-23T09:11:59Z" level=info msg="Creating a docker executor"
    time="2020-05-23T09:11:59Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/file-passing-pipelines-j4dxj-2063807832) with template:\n{\"name\":\"print-text-4\",\"inputs\":{\"artifacts\":[{\"name\":\"write-numbers-numbers\",\"path\":\"/tmp/inputs/text/data\",\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz\"}}]},\"outputs\":{},\"metadata\":{\"annotations\":{\"pipelines.kubeflow.org/component_spec\":\"{\\\"description\\\": \\\"Print text\\\", \\\"inputs\\\": [{\\\"name\\\": \\\"text\\\"}], \\\"name\\\": \\\"Print text\\\"}\",\"sidecar.istio.io/inject\":\"false\"},\"labels\":{\"pipelines.kubeflow.org/cache_enabled\":\"true\"}},\"container\":{\"name\":\"\",\"image\":\"tensorflow/tensorflow:1.13.2-py3\",\"command\":[\"python3\",\"-u\",\"-c\",\"def print_text(text_path ): # The \\\"text\\\" input is untyped so that any data can be printed\\n    '''Print text'''\\n    with open(text_path, 'r') as reader:\\n        for line in reader:\\n            print(line, end = '')\\n\\nimport argparse\\n_parser = argparse.ArgumentParser(prog='Print text', description='Print text')\\n_parser.add_argument(\\\"--text\\\", dest=\\\"text_path\\\", type=str, required=True, default=argparse.SUPPRESS)\\n_parsed_args = vars(_parser.parse_args())\\n_output_files = _parsed_args.pop(\\\"_output_paths\\\", [])\\n\\n_outputs = print_text(**_parsed_args)\\n\\n_output_serializers = [\\n\\n]\\n\\nimport os\\nfor idx, output_file in enumerate(_output_files):\\n    try:\\n        os.makedirs(os.path.dirname(output_file))\\n    except OSError:\\n        pass\\n    with open(output_file, 'w') as f:\\n        f.write(_output_serializers[idx](_outputs[idx]))\\n\"],\"args\":[\"--text\",\"/tmp/inputs/text/data\"],\"resources\":{}}}"
    time="2020-05-23T09:11:59Z" level=info msg="Waiting on main container"
    time="2020-05-23T09:12:00Z" level=info msg="main container started with container ID: 28e5f817c3f1b4bd9e6ff81761a414d3cb489487efe2235e7d1970541a4df48a"
    time="2020-05-23T09:12:00Z" level=info msg="Starting annotations monitor"
    time="2020-05-23T09:12:00Z" level=info msg="docker wait 28e5f817c3f1b4bd9e6ff81761a414d3cb489487efe2235e7d1970541a4df48a"
    time="2020-05-23T09:12:00Z" level=info msg="Starting deadline monitor"
    time="2020-05-23T09:12:00Z" level=info msg="Main container completed"
    time="2020-05-23T09:12:00Z" level=info msg="No sidecars"
    time="2020-05-23T09:12:00Z" level=info msg="No output parameters"
    time="2020-05-23T09:12:00Z" level=info msg="No output artifacts"
    time="2020-05-23T09:12:00Z" level=info msg="Alloc=4293 TotalAlloc=11000 Sys=70590 NumGC=4 Goroutines=9"
  2. for op that actually has no logs (e.g. write numbers) - u can see the main-log artifacts

    time="2020-05-23T09:11:56Z" level=info msg="Saving from /argo/outputs/logs/main.log to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/main.log)"
    time="2020-05-23T09:11:56Z" level=info msg="No output parameters"
    time="2020-05-23T09:11:56Z" level=info msg="Saving output artifacts"
    time="2020-05-23T09:11:56Z" level=info msg="Staging artifact: write-numbers-numbers"
    time="2020-05-23T09:11:56Z" level=info msg="Copying /tmp/outputs/numbers/data from container base image layer to /argo/outputs/artifacts/write-numbers-numbers.tgz"
    time="2020-05-23T09:11:56Z" level=info msg="Archiving e337b7e3942c63be4cdae2d008d2621f25fed978542511a7eab3ef600a212112:/tmp/outputs/numbers/data to /argo/outputs/artifacts/write-numbers-numbers.tgz"
    time="2020-05-23T09:11:56Z" level=info msg="sh -c docker cp -a e337b7e3942c63be4cdae2d008d2621f25fed978542511a7eab3ef600a212112:/tmp/outputs/numbers/data - | gzip > /argo/outputs/artifacts/write-numbers-numbers.tgz"
    time="2020-05-23T09:11:56Z" level=info msg="Archiving completed"
    time="2020-05-23T09:11:56Z" level=info msg="S3 Save path: /argo/outputs/artifacts/write-numbers-numbers.tgz, key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz"
    time="2020-05-23T09:11:56Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
    time="2020-05-23T09:11:56Z" level=info msg="Saving from /argo/outputs/artifacts/write-numbers-numbers.tgz to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz)"
    time="2020-05-23T09:11:56Z" level=info msg="Successfully saved file: /argo/outputs/artifacts/write-numbers-numbers.tgz"
    time="2020-05-23T09:11:56Z" level=info msg="Annotating pod with output"
    time="2020-05-23T09:11:56Z" level=info msg="Alloc=4362 TotalAlloc=13055 Sys=70590 NumGC=5 Goroutines=11"
eterna2 commented 4 years ago

This is the workflow status

  status:
    finishedAt: "2020-05-23T09:12:07Z"
    nodes:
      file-passing-pipelines-j4dxj:
        children:
        - file-passing-pipelines-j4dxj-3552447998
        - file-passing-pipelines-j4dxj-3568125204
        - file-passing-pipelines-j4dxj-3378179304
        displayName: file-passing-pipelines-j4dxj
        finishedAt: "2020-05-23T09:12:07Z"
        id: file-passing-pipelines-j4dxj
        name: file-passing-pipelines-j4dxj
        outboundNodes:
        - file-passing-pipelines-j4dxj-3516751673
        - file-passing-pipelines-j4dxj-2181251165
        - file-passing-pipelines-j4dxj-2164473546
        - file-passing-pipelines-j4dxj-2063807832
        - file-passing-pipelines-j4dxj-2080585451
        phase: Succeeded
        startedAt: "2020-05-23T09:11:53Z"
        templateName: file-passing-pipelines
        type: DAG
      file-passing-pipelines-j4dxj-190982672:
        boundaryID: file-passing-pipelines-j4dxj
        children:
        - file-passing-pipelines-j4dxj-2080585451
        displayName: sum-numbers
        finishedAt: "2020-05-23T09:12:01Z"
        id: file-passing-pipelines-j4dxj-190982672
        inputs:
          artifacts:
          - name: write-numbers-numbers
            path: /tmp/inputs/numbers/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        name: file-passing-pipelines-j4dxj.sum-numbers
        outputs:
          artifacts:
          - name: sum-numbers-output
            path: /tmp/outputs/Output/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-190982672/sum-numbers-output.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
          - archiveLogs: true
            name: main-logs
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-190982672/main.log
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        phase: Succeeded
        startedAt: "2020-05-23T09:11:57Z"
        templateName: sum-numbers
        type: Pod
      file-passing-pipelines-j4dxj-2063807832:
        boundaryID: file-passing-pipelines-j4dxj
        displayName: print-text-4
        finishedAt: "2020-05-23T09:12:00Z"
        id: file-passing-pipelines-j4dxj-2063807832
        inputs:
          artifacts:
          - name: write-numbers-numbers
            path: /tmp/inputs/text/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        name: file-passing-pipelines-j4dxj.print-text-4
        phase: Succeeded
        startedAt: "2020-05-23T09:11:57Z"
        templateName: print-text-4
        type: Pod
      file-passing-pipelines-j4dxj-2080585451:
        boundaryID: file-passing-pipelines-j4dxj
        displayName: print-text-5
        finishedAt: "2020-05-23T09:12:06Z"
        id: file-passing-pipelines-j4dxj-2080585451
        inputs:
          artifacts:
          - name: sum-numbers-output
            path: /tmp/inputs/text/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-190982672/sum-numbers-output.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        name: file-passing-pipelines-j4dxj.print-text-5
        phase: Succeeded
        startedAt: "2020-05-23T09:12:03Z"
        templateName: print-text-5
        type: Pod
      file-passing-pipelines-j4dxj-2164473546:
        boundaryID: file-passing-pipelines-j4dxj
        displayName: print-text-2
        finishedAt: "2020-05-23T09:12:03Z"
        id: file-passing-pipelines-j4dxj-2164473546
        inputs:
          artifacts:
          - name: split-text-lines-odd_lines
            path: /tmp/inputs/text/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/split-text-lines-odd_lines.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        name: file-passing-pipelines-j4dxj.print-text-2
        phase: Succeeded
        startedAt: "2020-05-23T09:12:00Z"
        templateName: print-text-2
        type: Pod
      file-passing-pipelines-j4dxj-2181251165:
        boundaryID: file-passing-pipelines-j4dxj
        displayName: print-text-3
        finishedAt: "2020-05-23T09:12:05Z"
        id: file-passing-pipelines-j4dxj-2181251165
        inputs:
          artifacts:
          - name: split-text-lines-even_lines
            path: /tmp/inputs/text/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/split-text-lines-even_lines.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        name: file-passing-pipelines-j4dxj.print-text-3
        phase: Succeeded
        startedAt: "2020-05-23T09:12:00Z"
        templateName: print-text-3
        type: Pod
      file-passing-pipelines-j4dxj-3378179304:
        boundaryID: file-passing-pipelines-j4dxj
        children:
        - file-passing-pipelines-j4dxj-3516751673
        displayName: repeat-line
        finishedAt: "2020-05-23T09:11:56Z"
        id: file-passing-pipelines-j4dxj-3378179304
        name: file-passing-pipelines-j4dxj.repeat-line
        outputs:
          artifacts:
          - name: repeat-line-output_text
            path: /tmp/outputs/output_text/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3378179304/repeat-line-output_text.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
          - archiveLogs: true
            name: main-logs
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3378179304/main.log
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        phase: Succeeded
        startedAt: "2020-05-23T09:11:53Z"
        templateName: repeat-line
        type: Pod
      file-passing-pipelines-j4dxj-3516751673:
        boundaryID: file-passing-pipelines-j4dxj
        displayName: print-text
        finishedAt: "2020-05-23T09:12:02Z"
        id: file-passing-pipelines-j4dxj-3516751673
        inputs:
          artifacts:
          - name: repeat-line-output_text
            path: /tmp/inputs/text/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3378179304/repeat-line-output_text.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        name: file-passing-pipelines-j4dxj.print-text
        phase: Succeeded
        startedAt: "2020-05-23T09:11:57Z"
        templateName: print-text
        type: Pod
      file-passing-pipelines-j4dxj-3552447998:
        boundaryID: file-passing-pipelines-j4dxj
        children:
        - file-passing-pipelines-j4dxj-2164473546
        - file-passing-pipelines-j4dxj-2181251165
        displayName: split-text-lines
        finishedAt: "2020-05-23T09:11:58Z"
        id: file-passing-pipelines-j4dxj-3552447998
        inputs:
          artifacts:
          - name: source
            path: /tmp/inputs/source/data
            raw:
              data: |-
                one
                two
                three
                four
                five
                six
                seven
                eight
                nine
                ten
        name: file-passing-pipelines-j4dxj.split-text-lines
        outputs:
          artifacts:
          - name: split-text-lines-even_lines
            path: /tmp/outputs/even_lines/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/split-text-lines-even_lines.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
          - name: split-text-lines-odd_lines
            path: /tmp/outputs/odd_lines/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/split-text-lines-odd_lines.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
          - archiveLogs: true
            name: main-logs
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/main.log
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        phase: Succeeded
        startedAt: "2020-05-23T09:11:53Z"
        templateName: split-text-lines
        type: Pod
      file-passing-pipelines-j4dxj-3568125204:
        boundaryID: file-passing-pipelines-j4dxj
        children:
        - file-passing-pipelines-j4dxj-2063807832
        - file-passing-pipelines-j4dxj-190982672
        displayName: write-numbers
        finishedAt: "2020-05-23T09:11:56Z"
        id: file-passing-pipelines-j4dxj-3568125204
        name: file-passing-pipelines-j4dxj.write-numbers
        outputs:
          artifacts:
          - name: write-numbers-numbers
            path: /tmp/outputs/numbers/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
          - archiveLogs: true 
            name: main-logs
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/main.log
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        phase: Succeeded
        startedAt: "2020-05-23T09:11:53Z"
        templateName: write-numbers
        type: Pod
    phase: Succeeded
    startedAt: "2020-05-23T09:11:53Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
eterna2 commented 4 years ago

e.g. this op (print text) does not have any output artifact at all.

file-passing-pipelines-j4dxj-2080585451:
        boundaryID: file-passing-pipelines-j4dxj
        displayName: print-text-5
        finishedAt: "2020-05-23T09:12:06Z"
        id: file-passing-pipelines-j4dxj-2080585451
        inputs:
          artifacts:
          - name: sum-numbers-output
            path: /tmp/inputs/text/data
            s3:
              accessKeySecret:
                key: accesskey
                name: mlpipeline-minio-artifact
              bucket: mlpipeline
              endpoint: minio-service.kubeflow:9000
              insecure: true
              key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-190982672/sum-numbers-output.tgz
              secretKeySecret:
                key: secretkey
                name: mlpipeline-minio-artifact
        name: file-passing-pipelines-j4dxj.print-text-5
        phase: Succeeded
        startedAt: "2020-05-23T09:12:03Z"
        templateName: print-text-5
        type: Pod
eterna2 commented 4 years ago

I tested by adding some print lines to op with outputs, and it works.

So I think what happened was, if the ops has no output at all, the logs will not be archived. Not sure if this is argo default behavior, might need to have a look.

Ark-kun commented 4 years ago

So I think what happened was, if the ops has no output at all, the logs will not be archived. Not sure if this is argo default behavior, might need to have a look.

Sorry, I should have warned you. Yes, there was a bug in Argo. It's already fixed in the 2.7.5 version that we've recently upgraded to (no formal release yet).

for op that actually has no logs (e.g. write numbers) - u can see the main-log artifacts

Just to clarify: I'm talking about the case where main-logs artifact exists. I can see it in the Input/Output list and your artifact preview feature shows the beginning of the log. This tells me that the status object is fine.

What is not working for me is the Logs tab.

eterna2 commented 4 years ago

hmmm I was unable to reproduce the problem. This is exactly what I did.

strange.

eterna2 commented 4 years ago

can u help me check the ml-pipeline-ui logs?

if there are errors should show up here. This is when there are no errors, and retrieving from the archives.

GET /k8s/pod/logs?podname=file-passing-pipelines-4kh8p-400479928&podnamespace=kubeflow
Getting logs for pod:file-passing-pipelines-4kh8p-400479928 from mlpipeline/artifacts/file-passing-pipelines-4kh8p/file-passing-pipelines-4kh8p-400479928/main.log.
GET /k8s/pod/logs?podname=file-passing-pipelines-4kh8p-417257547&podnamespace=kubeflow
Getting logs for pod:file-passing-pipelines-4kh8p-417257547 from mlpipeline/artifacts/file-passing-pipelines-4kh8p/file-passing-pipelines-4kh8p-417257547/main.log.
GET /k8s/pod/logs?podname=file-passing-pipelines-4kh8p-400479928&podnamespace=kubeflow
Getting logs for pod:file-passing-pipelines-4kh8p-400479928 from mlpipeline/artifacts/file-passing-pipelines-4kh8p/file-passing-pipelines-4kh8p-400479928/main.log.
GET /k8s/pod/logs?podname=file-passing-pipelines-4kh8p-417257547&podnamespace=kubeflow
Getting logs for pod:file-passing-pipelines-4kh8p-417257547 from mlpipeline/artifacts/file-passing-pipelines-4kh8p/file-passing-pipelines-4kh8p-417257547/main.log.
anupash147 commented 4 years ago

@Ark-kun : i cannot find the ARGO_ARCHIVE_PREFIX, I am using kubeflow 1.0 version. This is from channel image

eterna2 commented 4 years ago

@anupash147 ARGO_ARCHIVE_PREFIX is the env var for the ml-pipeline-ui pod (https://github.com/kubeflow/pipelines/blob/c0074463eef8416fd3345b8e926d847323c6ae87/frontend/server/configs.ts#L88)

U need to set it the same as the prefix the one u set in the workflow-controller configmap.

Additionally, ARGO_ARCHIVE_LOGS must be set to true or 1 (https://github.com/kubeflow/pipelines/blob/c0074463eef8416fd3345b8e926d847323c6ae87/frontend/server/configs.ts#L82)

anupash147 commented 4 years ago

@eterna2 : I think, I am lost.. I did try by passing an environment variable ARGO_ARCHIVE_LOGS = 'true' to the ml-pipeline-ui pod but did not find anything that looks like this image

eterna2 commented 4 years ago

Sorry. Can u give me abit of context. Cuz I think I wasn't in the initial conversation.

What were u trying to achieve? And what was the problem?

I dun quite understand ur screenshot?

anupash147 commented 4 years ago

@eterna2 : My original problem goes as, i have to migrate from cluster A to clusterB, to do that i backup the mysql & minio servers. This retains the runs and the artifacts. But when i go to check the logs of the runs in cluster B (UI) its blank. From the context of the ticket referenced via slack channel , i learnt that there is a ticket that talk about how we can archive the logs so that in cluster B its possible to see it. Hence am here.

As far as the screen shot i attached, its your reply on May 24 here ,as to how we can verify that the logs are getting archived. Let me know if its clear now.

eterna2 commented 4 years ago

There are a few things u need to consider.

Which manifest version was cluster A. i.e. was archival of the pod logs enabled. This is not done at the ml-pipeline-ui (the UI just read from the archive). This should be enabled in the workflow controller configmap.

If it is not enabled in cluster A, then the logs will not be archived in minio.

The env var flags u passed to ml-pipeline-ui are just to tell the UI where the archive is. Hence, they should be exactly the same as what you specified in your workflow controller configmap.

So u should first check the content of minio. Is there a main.log file? The path should be something like [prefix u provided in the configmap] / workflow name / pod name / main.log.

If there isn't. The pod logs are not archived at all in the first place.

Next, you should check that the artifactory env var u passed to ml-pipeline-ui is exactly the same as the ones in the workflow controller configmap.

And lastly, look at the ml-pipeline-ui logs. What do u see when u open the log tab?

U shld share the logs so at least I can have an idea what happened.

anupash147 commented 4 years ago

Both of the clusters are of kubeflow version 1.0.

With your help, i was able to figure out and able to make the below changes

  1. Added archiveLogs= true in the workflow-controller-configmap and restarted the workflow-controller. This enabled me to start getting the main.log files.
  2. I added 2 variable to deployment/ml-pipeline-ui . My UI version is gcr.io/ml-pipeline/frontend:0.2.0 i. ARGO_ARCHIVE_LOGS=true ii. ARGO_ARCHIVE_PREFIX='artifacts'. with this the logs point to the main.log location in the minio server. Here is the gist of the pipeline logs : https://gist.github.com/anupash147/747922700ca8caad84681288b34db8e7 --> example-metrics-pipeline-ttvcx-2745010860

Thank you @eterna2 for the above.

But problem still persists, After deleting the pod and having all configured i should have been able to see the logs in UI. but i am not . I think the exceptions that i am getting in the logs might be responsible.

eterna2 commented 4 years ago

Are u on Argo 2.7.5? There is a bug in the older version of Argo. Argo is the workflow controller.

If u are on kf 1.0. u shld still be on the old Argo. The bug is, if u do not have an output artifact, no artifact will be emitted. i.e. kfp will not know there is a main.log, if the task by default does not have an output artifact.

U can try changing the Argo version to 2.7.5 ( by editing the image tag, as well as the executor arg ). Or u can try producing an output in ur task.

eterna2 commented 4 years ago

Can u also tell me which version of kfp u r on? U probably can try upgrading it to 0.5.1. (the image tag)

anupash147 commented 4 years ago

Are u on Argo 2.7.5? There is a bug in the older version of Argo. Argo is the workflow controller.

If u are on kf 1.0. u shld still be on the old Argo. The bug is, if u do not have an output artifact, no artifact will be emitted. i.e. kfp will not know there is a main.log, if the task by default does not have an output artifact.

U can try changing the Argo version to 2.7.5 ( by editing the image tag, as well as the executor arg ). Or u can try producing an output in ur task.

By argo do you mean the argoproj/workflow-controller:v2.3.0 ? by kfp do you mean gcr.io/ml-pipeline/frontend:0.2.0?

eterna2 commented 4 years ago

Yeah. U are on 2.3.0 for Argo. And 0.2.0 for kfp. I think 0.2.0 shld still be fine.

I suspect the bug is probably becuz of Argo 2.3.0.

U can either intentionally add an output for ur kfp task, or upgrade to 2.7.5.

But for the workflow controller, u need to take care. I forgot where (I think it shld be in the configmap or the args). U need to pass the Argo executor to use. U have to update the image tag for that too.

anupash147 commented 4 years ago

I did upgrade the executor to argoproj/argoexec:v2.7.5. Good here but when i upgraded the workflow-controller, doomed with heavy error. see below

E0624 01:56:36.982059       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to list *v1alpha1.WorkflowTemplate: the server could not find the requested resource (get workflowtemplates.argoproj.io)
E0624 01:56:36.983106       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to list *v1alpha1.CronWorkflow: the server could not find the requested resource (get cronworkflows.argoproj.io)
E0624 01:56:37.983407       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to list *v1alpha1.WorkflowTemplate: the server could not find the requested resource (get workflowtemplates.argoproj.io)
E0624 01:56:37.984291       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to list *v1alpha1.CronWorkflow: the server could not find the requested resource (get cronworkflows.argoproj.io)

I am going to look into the https://github.com/kubeflow/pipelines/tree/0.5.1/manifests/kustomize this, i guess this is what you had mentioned. But thanks will keep you updated @eterna2

anupash147 commented 4 years ago

@eterna2 : thank you, i got it working now. just had to put ARGO_ARCHIVE_PREFIX='artifacts'

Ark-kun commented 4 years ago

@eterna2 : thank you, i got it working now. just had to put ARGO_ARCHIVE_PREFIX='artifacts'

Thank you @eterna2 and @anupash147 . This fixed it for me. I remember trying some prefix, but maybe it was wrong (e.g. maybe I used the full minio://mlpipeline/artifacts/ prefix).

@eterna2 Why is the prefix needed? The full log artifact URI seems to be available.

Ark-kun commented 4 years ago

Actually, no, it still does not seem to work for me. Here is the log:

2020-06-25T08:33:32.387751225Z GET /artifacts/get?source=minio&namespace=default&peek=256&bucket=mlpipeline&key=artifacts%2Fcatboost-pipeline-4dcnx%2Fcatboost-pipeline-4dcnx-1140917078%2Fchicago-taxi-trips-dataset-table.tgz
2020-06-25T08:33:32.387800632Z Getting storage artifact at: minio: mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/chicago-taxi-trips-dataset-table.tgz
2020-06-25T08:33:32.418881030Z GET /artifacts/get?source=minio&namespace=default&peek=256&bucket=mlpipeline&key=artifacts%2Fcatboost-pipeline-4dcnx%2Fcatboost-pipeline-4dcnx-1140917078%2Fmain.log
2020-06-25T08:33:32.419033086Z Getting storage artifact at: minio: mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/main.log
2020-06-25T08:33:32.579217321Z GET /artifacts/get?source=minio&namespace=default&peek=256&bucket=mlpipeline&key=artifacts%2Fcatboost-pipeline-4dcnx%2Fcatboost-pipeline-4dcnx-1140917078%2Fchicago-taxi-trips-dataset-table.tgz
2020-06-25T08:33:32.579388399Z Getting storage artifact at: minio: mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/chicago-taxi-trips-dataset-table.tgz
2020-06-25T08:33:32.650139513Z GET /artifacts/get?source=minio&namespace=default&peek=256&bucket=mlpipeline&key=artifacts%2Fcatboost-pipeline-4dcnx%2Fcatboost-pipeline-4dcnx-1140917078%2Fmain.log
2020-06-25T08:33:32.650354588Z Getting storage artifact at: minio: mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/main.log
2020-06-25T08:33:34.973389758Z GET /k8s/pod/logs?podname=parquet-pipeline-pgmz2-684574782&podnamespace=default
2020-06-25T08:33:34.986899370Z Getting logs for pod:parquet-pipeline-pgmz2-684574782 from mlpipeline/artifacts/parquet-pipeline-pgmz2/parquet-pipeline-pgmz2-684574782/main.log.

2020-06-25T08:33:44.775815452Z GET /k8s/pod/logs?podname=parquet-pipeline-pgmz2-684574782&podnamespace=default
2020-06-25T08:33:44.818858182Z Getting logs for pod:parquet-pipeline-pgmz2-684574782 from mlpipeline/artifacts/parquet-pipeline-pgmz2/parquet-pipeline-pgmz2-684574782/main.log.

I think I know what's happening. The Frontend constructs the logs artifact URI based on the pod name instead of taking the actual artifact from the workflow status (from the same place used to populate Inputs/Outputs tab). Unfortunately the constructed URI does not exist and the logs artifact we want to show is different.

The correct URI is

mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/main.log

but the UX accesses

mlpipeline/artifacts/parquet-pipeline-pgmz2/parquet-pipeline-pgmz2-684574782/main.log
eterna2 commented 4 years ago

This means there is some error parsing the workflow status. Let me investigate.

Because logs are retrieved in 3 ways in order:

Seem like it fallback to the 3rd method. Some how either it cannot query the workflow status, retrieve the secrets, or parse the status to get the archived log artifact.

This is the cat boost pipeline? Let me try.

eterna2 commented 4 years ago

@eterna2 : thank you, i got it working now. just had to put ARGO_ARCHIVE_PREFIX='artifacts'

Thank you @eterna2 and @anupash147 . This fixed it for me. I remember trying some prefix, but maybe it was wrong (e.g. maybe I used the full minio://mlpipeline/artifacts/ prefix).

@eterna2 Why is the prefix needed? The full log artifact URI seems to be available.

The prefix is for the final fallback. When the front end cannot parse or get the workflow status.

Usually is not required. If the workflow is parsed correctly. Is there a change in the schema after we upgrade to 2.7.5?

Let me check.

anupash147 commented 4 years ago

Although my log archive seems to be working still i fail to get adequate logging image

Even ml-pipeline-ui is looking for it image

Bobgy commented 4 years ago

I'm suspecting we didn't turn on the ml-pipeline-ui configuration that allows fetching from archived logs.

Bobgy commented 4 years ago

Right, we don't default to turn on show argo archive in UI because we need some configurations there: https://github.com/kubeflow/pipelines/blob/cc78bd1a4fda560454ec036c01cc0c9262517431/frontend/server/configs.ts#L81-L88

anupash147 commented 4 years ago

Although my log archive seems to be working still i fail to get adequate logging image

Even ml-pipeline-ui is looking for it image

In this case, i have manually turned on the logging but observed that not all steps of a pipeline are logged.

eterna2 commented 4 years ago

Are u using argo 2.3? There is a bug in argo 2.3 where logs will not be archived if there are not other artifacts (except the logs).

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

kd303 commented 3 years ago

Hi @eterna2 @Ark-kun @anupash147, Any help or pointers are appreciated, with below settings I am not able to see the main.log file created, I tried this with :

  1. I have upgraded argo executor to 2.7.5
  2. archiveLogs is set to true

I am not able to main.log file getting created anywhere, I am using example pipeline: [Tutorial] Data passing in python

"executorImage":"xxxx/argoexec:v2.7.5",
   "containerRuntimeExecutor":"docker",
     "artifactRepository":{
      "archiveLogs": true,
      "s3":{
         "bucket":"mlpipeline",
         "keyPrefix":"artifacts",
         "endpoint":"minio-service.kubeflow":9000,
         "insecure":true,
         "accessKeySecret":{
            "name":"mlpipeline-minio-artifact",
            "key":"accesskey"
         },
         "secretKeySecret":{
            "name":"mlpipeline-minio-artifact",
            "key":"secretkey"
         }
      }
   }
}
kd303 commented 3 years ago

pls note the quotes are added because of json formatter, original json in configmap is without quotes.

Another piece of information, whenever I do the run the serviceaccount is selected as "default-editor" whilst this particular role does not exist. I am doing this run with a user/namespace profile as "training"

What should be this role, what I understand if the user does not have a watch role then it may not be able to pull the logs to archives.. Pls help - I am at my wits end here :)

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

eshaingle commented 3 years ago

Hello folks, As we know the pod has archived logs that are successfully shown on the Inputs/Outputs tab in the form of URI.

Log URI: "main-logs minio://mlpipeline/artifacts/file-passing-pipelines-k6k9w/file-passing-pipelines-k6k9w-3191777755/main.log"

Is there any way to get this URI via api call? In case of failure, we can send this url to check logs. Any way to get this log url will be helpful.

anupash147 commented 3 years ago

the logs output again broke with latest pipeline version 1.3*, the argo exec used is argoexec:v3.1.6-license-compliance. Is this image not compatible with frontend 1.7-rc3 ??

Bobgy commented 3 years ago

I believe the feature was initially implemented, but not fully working.

Welcome contributions to fix it.

Ark-kun commented 3 years ago

Hmm. This feature worked in KFP 1.0.

It's pretty useful, especially when the execution is reused from cache - the users can see the original logs.

the logs output again broke with latest pipeline version 1.3*,

What are the symptoms? No main-logs artifact? Can you please check that "archiveLogs": true, in the workflow-controller-configmap? I'm not sure this is Frontend issue. Frontend just shows all produced Argo artifacts.

I think we should fix the issue.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.