Open tomaszstachera opened 5 months ago
Any update here?
Which component is responsible for the resolve
operation from error in the UI?
Root cause maybe lie in Argo as workflow-controller logs contain the message from UI:
workflow-controller time="2024-08-27T12:38:03.175Z" level=info msg="Processing workflow" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.176Z" level=info msg="Task-result reconciliation" namespace=tomasz numObjs=0 workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.176Z" level=info msg="All of node ppln-from-vsc-xkhhr.sample-op dependencies [] completed" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.177Z" level=info msg="Pod node ppln-from-vsc-xkhhr-1425665423 initialized Pending" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=warning msg="Non-transient error: failed to resolve {{`ppln-from-vsc-xkhhr`}}"
workflow-controller time="2024-08-27T12:38:03.178Z" level=error msg="Mark error node" error="failed to resolve {{`ppln-from-vsc-xkhhr`}}" namespace=tomasz nodeName=ppln-from-vsc-xkhhr.sampl
e-op workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr-1425665423 phase Pending -> Error" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr-1425665423 message: failed to resolve {{`ppln-from-vsc-xkhhr`}}" namespace=tomasz workflow=ppln-
from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr-1425665423 finished: 2024-08-27 12:38:03.178501556 +0000 UTC" namespace=tomasz workflow=ppln-fro
m-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=error msg="Mark error node" error="task 'ppln-from-vsc-xkhhr.sample-op' errored: failed to resolve {{`ppln-from-vsc-xkhhr`}}" names
pace=tomasz nodeName=ppln-from-vsc-xkhhr.sample-op workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr-1425665423 message: task 'ppln-from-vsc-xkhhr.sample-op' errored: failed to resolve {{`ppln-from
-vsc-xkhhr`}}" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Skipped node ppln-from-vsc-xkhhr-184939484 initialized Omitted (message: omitted: depends condition not met)" namespace=t
omasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Outbound nodes of ppln-from-vsc-xkhhr set to [ppln-from-vsc-xkhhr-184939484]" namespace=tomasz workflow=ppln-from-vsc-xkh
hr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr phase Running -> Error" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr finished: 2024-08-27 12:38:03.178728796 +0000 UTC" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Checking daemoned children of ppln-from-vsc-xkhhr" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="TaskSet Reconciliation" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg=reconcileAgentPod namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Updated phase Running -> Error" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Marking workflow completed" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Checking daemoned children of " namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Workflow to be dehydrated" Workflow Size=9953
workflow-controller time="2024-08-27T12:38:03.184Z" level=info msg="cleaning up pod" action=deletePod key=tomasz/ppln-from-vsc-xkhhr-1340600742-agent/deletePod
workflow-controller time="2024-08-27T12:38:03.188Z" level=info msg="Update workflows 200"
workflow-controller time="2024-08-27T12:38:03.189Z" level=info msg="Workflow update successful" namespace=tomasz phase=Error resourceVersion=724918529 workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.189Z" level=info msg="Queueing Error workflow tomasz/ppln-from-vsc-xkhhr for delete in 168h0m0s due to TTL"
workflow-controller time="2024-08-27T12:38:03.195Z" level=info msg="Delete pods 404"
workflow-controller time="2024-08-27T12:38:03.196Z" level=info msg="DeleteCollection workflowtaskresults 200"
workflow-controller time="2024-08-27T12:38:03.197Z" level=info msg="Patch events 200"
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Environment
Steps to reproduce
Run below pipeline (kfp==1.8.21)
Expected result
Pipeline should run and succeed. This suddenly started happening, worked before.
Materials and Reference
Logs:
ml-pipeline
ml-pipeline-ui
Impacted by this bug? Give it a 👍.