Closed Ark-kun closed 1 year ago
The configmap for Argo has set archive to true also?
Did the run succeed? Because if the run did not succeed, the workflow status sometimes will not report the artifact location, and in this case the server cannot retrieve the logs unless u setup the default bucket location (there are a few more config where u can declare where the Argo artifacts are located).
But let me test on my machine and see, might be a regression.
The configmap for Argo has set archive to true also?
Yes. (archiveLogs
is set to true
by default for the last several releases).
Did the run succeed? Because if the run did not succeed, the workflow status sometimes will not report the artifact location, and in this case the server cannot retrieve the logs unless u setup the default bucket location (there are a few more config where u can declare where the Argo artifacts are located).
Yes. As I posted, the logs are uploaded and I see them on the Inputs/Outputs tab: Log URI: "main-logs minio://mlpipeline/artifacts/file-passing-pipelines-k6k9w/file-passing-pipelines-k6k9w-3191777755/main.log".
BTW, your great artifact preview feature successfully previews the logs =)
This is what I found running the data passing pipeline.
for ops that has a logs (e.g. Print Text), argo did not out the logs as artifacts - i.e. if u go to the UI and click on Input/Output, u will not see any main-log
artifact
time="2020-05-23T09:11:59Z" level=info msg="Creating a docker executor"
time="2020-05-23T09:11:59Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/file-passing-pipelines-j4dxj-2063807832) with template:\n{\"name\":\"print-text-4\",\"inputs\":{\"artifacts\":[{\"name\":\"write-numbers-numbers\",\"path\":\"/tmp/inputs/text/data\",\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz\"}}]},\"outputs\":{},\"metadata\":{\"annotations\":{\"pipelines.kubeflow.org/component_spec\":\"{\\\"description\\\": \\\"Print text\\\", \\\"inputs\\\": [{\\\"name\\\": \\\"text\\\"}], \\\"name\\\": \\\"Print text\\\"}\",\"sidecar.istio.io/inject\":\"false\"},\"labels\":{\"pipelines.kubeflow.org/cache_enabled\":\"true\"}},\"container\":{\"name\":\"\",\"image\":\"tensorflow/tensorflow:1.13.2-py3\",\"command\":[\"python3\",\"-u\",\"-c\",\"def print_text(text_path ): # The \\\"text\\\" input is untyped so that any data can be printed\\n '''Print text'''\\n with open(text_path, 'r') as reader:\\n for line in reader:\\n print(line, end = '')\\n\\nimport argparse\\n_parser = argparse.ArgumentParser(prog='Print text', description='Print text')\\n_parser.add_argument(\\\"--text\\\", dest=\\\"text_path\\\", type=str, required=True, default=argparse.SUPPRESS)\\n_parsed_args = vars(_parser.parse_args())\\n_output_files = _parsed_args.pop(\\\"_output_paths\\\", [])\\n\\n_outputs = print_text(**_parsed_args)\\n\\n_output_serializers = [\\n\\n]\\n\\nimport os\\nfor idx, output_file in enumerate(_output_files):\\n try:\\n os.makedirs(os.path.dirname(output_file))\\n except OSError:\\n pass\\n with open(output_file, 'w') as f:\\n f.write(_output_serializers[idx](_outputs[idx]))\\n\"],\"args\":[\"--text\",\"/tmp/inputs/text/data\"],\"resources\":{}}}"
time="2020-05-23T09:11:59Z" level=info msg="Waiting on main container"
time="2020-05-23T09:12:00Z" level=info msg="main container started with container ID: 28e5f817c3f1b4bd9e6ff81761a414d3cb489487efe2235e7d1970541a4df48a"
time="2020-05-23T09:12:00Z" level=info msg="Starting annotations monitor"
time="2020-05-23T09:12:00Z" level=info msg="docker wait 28e5f817c3f1b4bd9e6ff81761a414d3cb489487efe2235e7d1970541a4df48a"
time="2020-05-23T09:12:00Z" level=info msg="Starting deadline monitor"
time="2020-05-23T09:12:00Z" level=info msg="Main container completed"
time="2020-05-23T09:12:00Z" level=info msg="No sidecars"
time="2020-05-23T09:12:00Z" level=info msg="No output parameters"
time="2020-05-23T09:12:00Z" level=info msg="No output artifacts"
time="2020-05-23T09:12:00Z" level=info msg="Alloc=4293 TotalAlloc=11000 Sys=70590 NumGC=4 Goroutines=9"
for op that actually has no logs (e.g. write numbers) - u can see the main-log
artifacts
time="2020-05-23T09:11:56Z" level=info msg="Saving from /argo/outputs/logs/main.log to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/main.log)"
time="2020-05-23T09:11:56Z" level=info msg="No output parameters"
time="2020-05-23T09:11:56Z" level=info msg="Saving output artifacts"
time="2020-05-23T09:11:56Z" level=info msg="Staging artifact: write-numbers-numbers"
time="2020-05-23T09:11:56Z" level=info msg="Copying /tmp/outputs/numbers/data from container base image layer to /argo/outputs/artifacts/write-numbers-numbers.tgz"
time="2020-05-23T09:11:56Z" level=info msg="Archiving e337b7e3942c63be4cdae2d008d2621f25fed978542511a7eab3ef600a212112:/tmp/outputs/numbers/data to /argo/outputs/artifacts/write-numbers-numbers.tgz"
time="2020-05-23T09:11:56Z" level=info msg="sh -c docker cp -a e337b7e3942c63be4cdae2d008d2621f25fed978542511a7eab3ef600a212112:/tmp/outputs/numbers/data - | gzip > /argo/outputs/artifacts/write-numbers-numbers.tgz"
time="2020-05-23T09:11:56Z" level=info msg="Archiving completed"
time="2020-05-23T09:11:56Z" level=info msg="S3 Save path: /argo/outputs/artifacts/write-numbers-numbers.tgz, key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz"
time="2020-05-23T09:11:56Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2020-05-23T09:11:56Z" level=info msg="Saving from /argo/outputs/artifacts/write-numbers-numbers.tgz to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz)"
time="2020-05-23T09:11:56Z" level=info msg="Successfully saved file: /argo/outputs/artifacts/write-numbers-numbers.tgz"
time="2020-05-23T09:11:56Z" level=info msg="Annotating pod with output"
time="2020-05-23T09:11:56Z" level=info msg="Alloc=4362 TotalAlloc=13055 Sys=70590 NumGC=5 Goroutines=11"
This is the workflow status
status:
finishedAt: "2020-05-23T09:12:07Z"
nodes:
file-passing-pipelines-j4dxj:
children:
- file-passing-pipelines-j4dxj-3552447998
- file-passing-pipelines-j4dxj-3568125204
- file-passing-pipelines-j4dxj-3378179304
displayName: file-passing-pipelines-j4dxj
finishedAt: "2020-05-23T09:12:07Z"
id: file-passing-pipelines-j4dxj
name: file-passing-pipelines-j4dxj
outboundNodes:
- file-passing-pipelines-j4dxj-3516751673
- file-passing-pipelines-j4dxj-2181251165
- file-passing-pipelines-j4dxj-2164473546
- file-passing-pipelines-j4dxj-2063807832
- file-passing-pipelines-j4dxj-2080585451
phase: Succeeded
startedAt: "2020-05-23T09:11:53Z"
templateName: file-passing-pipelines
type: DAG
file-passing-pipelines-j4dxj-190982672:
boundaryID: file-passing-pipelines-j4dxj
children:
- file-passing-pipelines-j4dxj-2080585451
displayName: sum-numbers
finishedAt: "2020-05-23T09:12:01Z"
id: file-passing-pipelines-j4dxj-190982672
inputs:
artifacts:
- name: write-numbers-numbers
path: /tmp/inputs/numbers/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
name: file-passing-pipelines-j4dxj.sum-numbers
outputs:
artifacts:
- name: sum-numbers-output
path: /tmp/outputs/Output/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-190982672/sum-numbers-output.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
- archiveLogs: true
name: main-logs
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-190982672/main.log
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
phase: Succeeded
startedAt: "2020-05-23T09:11:57Z"
templateName: sum-numbers
type: Pod
file-passing-pipelines-j4dxj-2063807832:
boundaryID: file-passing-pipelines-j4dxj
displayName: print-text-4
finishedAt: "2020-05-23T09:12:00Z"
id: file-passing-pipelines-j4dxj-2063807832
inputs:
artifacts:
- name: write-numbers-numbers
path: /tmp/inputs/text/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
name: file-passing-pipelines-j4dxj.print-text-4
phase: Succeeded
startedAt: "2020-05-23T09:11:57Z"
templateName: print-text-4
type: Pod
file-passing-pipelines-j4dxj-2080585451:
boundaryID: file-passing-pipelines-j4dxj
displayName: print-text-5
finishedAt: "2020-05-23T09:12:06Z"
id: file-passing-pipelines-j4dxj-2080585451
inputs:
artifacts:
- name: sum-numbers-output
path: /tmp/inputs/text/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-190982672/sum-numbers-output.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
name: file-passing-pipelines-j4dxj.print-text-5
phase: Succeeded
startedAt: "2020-05-23T09:12:03Z"
templateName: print-text-5
type: Pod
file-passing-pipelines-j4dxj-2164473546:
boundaryID: file-passing-pipelines-j4dxj
displayName: print-text-2
finishedAt: "2020-05-23T09:12:03Z"
id: file-passing-pipelines-j4dxj-2164473546
inputs:
artifacts:
- name: split-text-lines-odd_lines
path: /tmp/inputs/text/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/split-text-lines-odd_lines.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
name: file-passing-pipelines-j4dxj.print-text-2
phase: Succeeded
startedAt: "2020-05-23T09:12:00Z"
templateName: print-text-2
type: Pod
file-passing-pipelines-j4dxj-2181251165:
boundaryID: file-passing-pipelines-j4dxj
displayName: print-text-3
finishedAt: "2020-05-23T09:12:05Z"
id: file-passing-pipelines-j4dxj-2181251165
inputs:
artifacts:
- name: split-text-lines-even_lines
path: /tmp/inputs/text/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/split-text-lines-even_lines.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
name: file-passing-pipelines-j4dxj.print-text-3
phase: Succeeded
startedAt: "2020-05-23T09:12:00Z"
templateName: print-text-3
type: Pod
file-passing-pipelines-j4dxj-3378179304:
boundaryID: file-passing-pipelines-j4dxj
children:
- file-passing-pipelines-j4dxj-3516751673
displayName: repeat-line
finishedAt: "2020-05-23T09:11:56Z"
id: file-passing-pipelines-j4dxj-3378179304
name: file-passing-pipelines-j4dxj.repeat-line
outputs:
artifacts:
- name: repeat-line-output_text
path: /tmp/outputs/output_text/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3378179304/repeat-line-output_text.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
- archiveLogs: true
name: main-logs
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3378179304/main.log
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
phase: Succeeded
startedAt: "2020-05-23T09:11:53Z"
templateName: repeat-line
type: Pod
file-passing-pipelines-j4dxj-3516751673:
boundaryID: file-passing-pipelines-j4dxj
displayName: print-text
finishedAt: "2020-05-23T09:12:02Z"
id: file-passing-pipelines-j4dxj-3516751673
inputs:
artifacts:
- name: repeat-line-output_text
path: /tmp/inputs/text/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3378179304/repeat-line-output_text.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
name: file-passing-pipelines-j4dxj.print-text
phase: Succeeded
startedAt: "2020-05-23T09:11:57Z"
templateName: print-text
type: Pod
file-passing-pipelines-j4dxj-3552447998:
boundaryID: file-passing-pipelines-j4dxj
children:
- file-passing-pipelines-j4dxj-2164473546
- file-passing-pipelines-j4dxj-2181251165
displayName: split-text-lines
finishedAt: "2020-05-23T09:11:58Z"
id: file-passing-pipelines-j4dxj-3552447998
inputs:
artifacts:
- name: source
path: /tmp/inputs/source/data
raw:
data: |-
one
two
three
four
five
six
seven
eight
nine
ten
name: file-passing-pipelines-j4dxj.split-text-lines
outputs:
artifacts:
- name: split-text-lines-even_lines
path: /tmp/outputs/even_lines/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/split-text-lines-even_lines.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
- name: split-text-lines-odd_lines
path: /tmp/outputs/odd_lines/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/split-text-lines-odd_lines.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
- archiveLogs: true
name: main-logs
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3552447998/main.log
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
phase: Succeeded
startedAt: "2020-05-23T09:11:53Z"
templateName: split-text-lines
type: Pod
file-passing-pipelines-j4dxj-3568125204:
boundaryID: file-passing-pipelines-j4dxj
children:
- file-passing-pipelines-j4dxj-2063807832
- file-passing-pipelines-j4dxj-190982672
displayName: write-numbers
finishedAt: "2020-05-23T09:11:56Z"
id: file-passing-pipelines-j4dxj-3568125204
name: file-passing-pipelines-j4dxj.write-numbers
outputs:
artifacts:
- name: write-numbers-numbers
path: /tmp/outputs/numbers/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/write-numbers-numbers.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
- archiveLogs: true
name: main-logs
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-3568125204/main.log
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
phase: Succeeded
startedAt: "2020-05-23T09:11:53Z"
templateName: write-numbers
type: Pod
phase: Succeeded
startedAt: "2020-05-23T09:11:53Z"
kind: List
metadata:
resourceVersion: ""
selfLink: ""
e.g. this op (print text) does not have any output artifact at all.
file-passing-pipelines-j4dxj-2080585451:
boundaryID: file-passing-pipelines-j4dxj
displayName: print-text-5
finishedAt: "2020-05-23T09:12:06Z"
id: file-passing-pipelines-j4dxj-2080585451
inputs:
artifacts:
- name: sum-numbers-output
path: /tmp/inputs/text/data
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
key: artifacts/file-passing-pipelines-j4dxj/file-passing-pipelines-j4dxj-190982672/sum-numbers-output.tgz
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
name: file-passing-pipelines-j4dxj.print-text-5
phase: Succeeded
startedAt: "2020-05-23T09:12:03Z"
templateName: print-text-5
type: Pod
I tested by adding some print lines to op with outputs, and it works.
So I think what happened was, if the ops has no output at all, the logs will not be archived. Not sure if this is argo default behavior, might need to have a look.
So I think what happened was, if the ops has no output at all, the logs will not be archived. Not sure if this is argo default behavior, might need to have a look.
Sorry, I should have warned you. Yes, there was a bug in Argo. It's already fixed in the 2.7.5 version that we've recently upgraded to (no formal release yet).
for op that actually has no logs (e.g. write numbers) - u can see the main-log artifacts
Just to clarify: I'm talking about the case where main-logs artifact exists. I can see it in the Input/Output list and your artifact preview feature shows the beginning of the log. This tells me that the status object is fine.
What is not working for me is the Logs tab.
hmmm I was unable to reproduce the problem. This is exactly what I did.
ARGO_ARCHIVE_LOGS="true"
strange.
can u help me check the ml-pipeline-ui logs?
if there are errors should show up here. This is when there are no errors, and retrieving from the archives.
GET /k8s/pod/logs?podname=file-passing-pipelines-4kh8p-400479928&podnamespace=kubeflow
Getting logs for pod:file-passing-pipelines-4kh8p-400479928 from mlpipeline/artifacts/file-passing-pipelines-4kh8p/file-passing-pipelines-4kh8p-400479928/main.log.
GET /k8s/pod/logs?podname=file-passing-pipelines-4kh8p-417257547&podnamespace=kubeflow
Getting logs for pod:file-passing-pipelines-4kh8p-417257547 from mlpipeline/artifacts/file-passing-pipelines-4kh8p/file-passing-pipelines-4kh8p-417257547/main.log.
GET /k8s/pod/logs?podname=file-passing-pipelines-4kh8p-400479928&podnamespace=kubeflow
Getting logs for pod:file-passing-pipelines-4kh8p-400479928 from mlpipeline/artifacts/file-passing-pipelines-4kh8p/file-passing-pipelines-4kh8p-400479928/main.log.
GET /k8s/pod/logs?podname=file-passing-pipelines-4kh8p-417257547&podnamespace=kubeflow
Getting logs for pod:file-passing-pipelines-4kh8p-417257547 from mlpipeline/artifacts/file-passing-pipelines-4kh8p/file-passing-pipelines-4kh8p-417257547/main.log.
@Ark-kun : i cannot find the ARGO_ARCHIVE_PREFIX, I am using kubeflow 1.0 version. This is from channel
@anupash147
ARGO_ARCHIVE_PREFIX
is the env var for the ml-pipeline-ui
pod (https://github.com/kubeflow/pipelines/blob/c0074463eef8416fd3345b8e926d847323c6ae87/frontend/server/configs.ts#L88)
U need to set it the same as the prefix the one u set in the workflow-controller
configmap.
Additionally, ARGO_ARCHIVE_LOGS
must be set to true or 1 (https://github.com/kubeflow/pipelines/blob/c0074463eef8416fd3345b8e926d847323c6ae87/frontend/server/configs.ts#L82)
@eterna2 : I think, I am lost.. I did try by passing an environment variable ARGO_ARCHIVE_LOGS = 'true' to the ml-pipeline-ui pod but did not find anything that looks like this
Sorry. Can u give me abit of context. Cuz I think I wasn't in the initial conversation.
What were u trying to achieve? And what was the problem?
I dun quite understand ur screenshot?
@eterna2 : My original problem goes as, i have to migrate from cluster A to clusterB, to do that i backup the mysql & minio servers. This retains the runs and the artifacts. But when i go to check the logs of the runs in cluster B (UI) its blank. From the context of the ticket referenced via slack channel , i learnt that there is a ticket that talk about how we can archive the logs so that in cluster B its possible to see it. Hence am here.
As far as the screen shot i attached, its your reply on May 24 here ,as to how we can verify that the logs are getting archived. Let me know if its clear now.
There are a few things u need to consider.
Which manifest version was cluster A. i.e. was archival of the pod logs enabled. This is not done at the ml-pipeline-ui (the UI just read from the archive). This should be enabled in the workflow controller configmap.
If it is not enabled in cluster A, then the logs will not be archived in minio.
The env var flags u passed to ml-pipeline-ui are just to tell the UI where the archive is. Hence, they should be exactly the same as what you specified in your workflow controller configmap.
So u should first check the content of minio. Is there a main.log
file? The path should be something like [prefix u provided in the configmap] / workflow name / pod name / main.log
.
If there isn't. The pod logs are not archived at all in the first place.
Next, you should check that the artifactory env var u passed to ml-pipeline-ui is exactly the same as the ones in the workflow controller configmap.
And lastly, look at the ml-pipeline-ui logs. What do u see when u open the log tab?
U shld share the logs so at least I can have an idea what happened.
Both of the clusters are of kubeflow version 1.0.
With your help, i was able to figure out and able to make the below changes
archiveLogs= true
in the workflow-controller-configmap and restarted the workflow-controller. This enabled me to start getting the main.log files.deployment/ml-pipeline-ui
. My UI version is gcr.io/ml-pipeline/frontend:0.2.0
i. ARGO_ARCHIVE_LOGS=true
ii. ARGO_ARCHIVE_PREFIX='artifacts'
.
with this the logs point to the main.log location in the minio server.
Here is the gist of the pipeline logs : https://gist.github.com/anupash147/747922700ca8caad84681288b34db8e7 --> example-metrics-pipeline-ttvcx-2745010860
Thank you @eterna2 for the above.
But problem still persists, After deleting the pod and having all configured i should have been able to see the logs in UI. but i am not . I think the exceptions that i am getting in the logs might be responsible.
Are u on Argo 2.7.5? There is a bug in the older version of Argo. Argo is the workflow controller.
If u are on kf 1.0. u shld still be on the old Argo. The bug is, if u do not have an output artifact, no artifact will be emitted. i.e. kfp will not know there is a main.log, if the task by default does not have an output artifact.
U can try changing the Argo version to 2.7.5 ( by editing the image tag, as well as the executor arg ). Or u can try producing an output in ur task.
Can u also tell me which version of kfp u r on? U probably can try upgrading it to 0.5.1. (the image tag)
Are u on Argo 2.7.5? There is a bug in the older version of Argo. Argo is the workflow controller.
If u are on kf 1.0. u shld still be on the old Argo. The bug is, if u do not have an output artifact, no artifact will be emitted. i.e. kfp will not know there is a main.log, if the task by default does not have an output artifact.
U can try changing the Argo version to 2.7.5 ( by editing the image tag, as well as the executor arg ). Or u can try producing an output in ur task.
By argo do you mean the argoproj/workflow-controller:v2.3.0 ? by kfp do you mean gcr.io/ml-pipeline/frontend:0.2.0?
Yeah. U are on 2.3.0 for Argo. And 0.2.0 for kfp. I think 0.2.0 shld still be fine.
I suspect the bug is probably becuz of Argo 2.3.0.
U can either intentionally add an output for ur kfp task, or upgrade to 2.7.5.
But for the workflow controller, u need to take care. I forgot where (I think it shld be in the configmap or the args). U need to pass the Argo executor to use. U have to update the image tag for that too.
I did upgrade the executor to argoproj/argoexec:v2.7.5
. Good here but when i upgraded the workflow-controller, doomed with heavy error. see below
E0624 01:56:36.982059 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to list *v1alpha1.WorkflowTemplate: the server could not find the requested resource (get workflowtemplates.argoproj.io)
E0624 01:56:36.983106 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to list *v1alpha1.CronWorkflow: the server could not find the requested resource (get cronworkflows.argoproj.io)
E0624 01:56:37.983407 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to list *v1alpha1.WorkflowTemplate: the server could not find the requested resource (get workflowtemplates.argoproj.io)
E0624 01:56:37.984291 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20191225075139-73fd2ddc9180/tools/cache/reflector.go:98: Failed to list *v1alpha1.CronWorkflow: the server could not find the requested resource (get cronworkflows.argoproj.io)
I am going to look into the https://github.com/kubeflow/pipelines/tree/0.5.1/manifests/kustomize this, i guess this is what you had mentioned. But thanks will keep you updated @eterna2
@eterna2 : thank you, i got it working now. just had to put ARGO_ARCHIVE_PREFIX='artifacts'
@eterna2 : thank you, i got it working now. just had to put ARGO_ARCHIVE_PREFIX='artifacts'
Thank you @eterna2 and @anupash147 . This fixed it for me. I remember trying some prefix, but maybe it was wrong (e.g. maybe I used the full minio://mlpipeline/artifacts/
prefix).
@eterna2 Why is the prefix needed? The full log artifact URI seems to be available.
Actually, no, it still does not seem to work for me. Here is the log:
2020-06-25T08:33:32.387751225Z GET /artifacts/get?source=minio&namespace=default&peek=256&bucket=mlpipeline&key=artifacts%2Fcatboost-pipeline-4dcnx%2Fcatboost-pipeline-4dcnx-1140917078%2Fchicago-taxi-trips-dataset-table.tgz
2020-06-25T08:33:32.387800632Z Getting storage artifact at: minio: mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/chicago-taxi-trips-dataset-table.tgz
2020-06-25T08:33:32.418881030Z GET /artifacts/get?source=minio&namespace=default&peek=256&bucket=mlpipeline&key=artifacts%2Fcatboost-pipeline-4dcnx%2Fcatboost-pipeline-4dcnx-1140917078%2Fmain.log
2020-06-25T08:33:32.419033086Z Getting storage artifact at: minio: mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/main.log
2020-06-25T08:33:32.579217321Z GET /artifacts/get?source=minio&namespace=default&peek=256&bucket=mlpipeline&key=artifacts%2Fcatboost-pipeline-4dcnx%2Fcatboost-pipeline-4dcnx-1140917078%2Fchicago-taxi-trips-dataset-table.tgz
2020-06-25T08:33:32.579388399Z Getting storage artifact at: minio: mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/chicago-taxi-trips-dataset-table.tgz
2020-06-25T08:33:32.650139513Z GET /artifacts/get?source=minio&namespace=default&peek=256&bucket=mlpipeline&key=artifacts%2Fcatboost-pipeline-4dcnx%2Fcatboost-pipeline-4dcnx-1140917078%2Fmain.log
2020-06-25T08:33:32.650354588Z Getting storage artifact at: minio: mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/main.log
2020-06-25T08:33:34.973389758Z GET /k8s/pod/logs?podname=parquet-pipeline-pgmz2-684574782&podnamespace=default
2020-06-25T08:33:34.986899370Z Getting logs for pod:parquet-pipeline-pgmz2-684574782 from mlpipeline/artifacts/parquet-pipeline-pgmz2/parquet-pipeline-pgmz2-684574782/main.log.
2020-06-25T08:33:44.775815452Z GET /k8s/pod/logs?podname=parquet-pipeline-pgmz2-684574782&podnamespace=default
2020-06-25T08:33:44.818858182Z Getting logs for pod:parquet-pipeline-pgmz2-684574782 from mlpipeline/artifacts/parquet-pipeline-pgmz2/parquet-pipeline-pgmz2-684574782/main.log.
I think I know what's happening. The Frontend constructs the logs artifact URI based on the pod name instead of taking the actual artifact from the workflow status (from the same place used to populate Inputs/Outputs tab). Unfortunately the constructed URI does not exist and the logs artifact we want to show is different.
The correct URI is
mlpipeline/artifacts/catboost-pipeline-4dcnx/catboost-pipeline-4dcnx-1140917078/main.log
but the UX accesses
mlpipeline/artifacts/parquet-pipeline-pgmz2/parquet-pipeline-pgmz2-684574782/main.log
This means there is some error parsing the workflow status. Let me investigate.
Because logs are retrieved in 3 ways in order:
Seem like it fallback to the 3rd method. Some how either it cannot query the workflow status, retrieve the secrets, or parse the status to get the archived log artifact.
This is the cat boost pipeline? Let me try.
@eterna2 : thank you, i got it working now. just had to put ARGO_ARCHIVE_PREFIX='artifacts'
Thank you @eterna2 and @anupash147 . This fixed it for me. I remember trying some prefix, but maybe it was wrong (e.g. maybe I used the full
minio://mlpipeline/artifacts/
prefix).@eterna2 Why is the prefix needed? The full log artifact URI seems to be available.
The prefix is for the final fallback. When the front end cannot parse or get the workflow status.
Usually is not required. If the workflow is parsed correctly. Is there a change in the schema after we upgrade to 2.7.5?
Let me check.
Although my log archive seems to be working still i fail to get adequate logging
Even ml-pipeline-ui is looking for it
I'm suspecting we didn't turn on the ml-pipeline-ui configuration that allows fetching from archived logs.
Right, we don't default to turn on show argo archive in UI because we need some configurations there: https://github.com/kubeflow/pipelines/blob/cc78bd1a4fda560454ec036c01cc0c9262517431/frontend/server/configs.ts#L81-L88
Although my log archive seems to be working still i fail to get adequate logging
Even ml-pipeline-ui is looking for it
In this case, i have manually turned on the logging but observed that not all steps of a pipeline are logged.
Are u using argo 2.3? There is a bug in argo 2.3 where logs will not be archived if there are not other artifacts (except the logs).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @eterna2 @Ark-kun @anupash147, Any help or pointers are appreciated, with below settings I am not able to see the main.log file created, I tried this with :
I am not able to main.log file getting created anywhere, I am using example pipeline: [Tutorial] Data passing in python
"executorImage":"xxxx/argoexec:v2.7.5",
"containerRuntimeExecutor":"docker",
"artifactRepository":{
"archiveLogs": true,
"s3":{
"bucket":"mlpipeline",
"keyPrefix":"artifacts",
"endpoint":"minio-service.kubeflow":9000,
"insecure":true,
"accessKeySecret":{
"name":"mlpipeline-minio-artifact",
"key":"accesskey"
},
"secretKeySecret":{
"name":"mlpipeline-minio-artifact",
"key":"secretkey"
}
}
}
}
pls note the quotes are added because of json formatter, original json in configmap is without quotes.
Another piece of information, whenever I do the run the serviceaccount is selected as "default-editor" whilst this particular role does not exist. I am doing this run with a user/namespace profile as "training"
What should be this role, what I understand if the user does not have a watch role then it may not be able to pull the logs to archives.. Pls help - I am at my wits end here :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hello folks, As we know the pod has archived logs that are successfully shown on the Inputs/Outputs tab in the form of URI.
Log URI: "main-logs minio://mlpipeline/artifacts/file-passing-pipelines-k6k9w/file-passing-pipelines-k6k9w-3191777755/main.log"
Is there any way to get this URI via api call? In case of failure, we can send this url to check logs. Any way to get this log url will be helpful.
the logs output again broke with latest pipeline version 1.3*, the argo exec used is argoexec:v3.1.6-license-compliance. Is this image not compatible with frontend 1.7-rc3 ??
I believe the feature was initially implemented, but not fully working.
Welcome contributions to fix it.
Hmm. This feature worked in KFP 1.0.
It's pretty useful, especially when the execution is reused from cache - the users can see the original logs.
the logs output again broke with latest pipeline version 1.3*,
What are the symptoms? No main-logs artifact? Can you please check that "archiveLogs": true,
in the workflow-controller-configmap? I'm not sure this is Frontend issue. Frontend just shows all produced Argo artifacts.
I think we should fix the issue.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Repro steps:
ml-pipeline-ui
The pod has archived logs that are successfully shown on the Inputs/Outputs tab.
Log URI: "main-logs minio://mlpipeline/artifacts/file-passing-pipelines-k6k9w/file-passing-pipelines-k6k9w-3191777755/main.log"
However when I navigate to the Logs tab I get the usual error "Warning: failed to retrieve pod logs. Possible reasons include cluster autoscaling or pod preemption" and when I click "Details" I see "Could not get main container logs: S3Error: The specified key does not exist."
I've tried to set
ARGO_ARCHIVE_PREFIX: ""
, but got the same error.Version: master.