Open tomaszstachera opened 3 months ago
Any update here?
Which component is responsible for the resolve
operation from error in the UI?
Root cause maybe lie in Argo as workflow-controller logs contain the message from UI:
workflow-controller time="2024-08-27T12:38:03.175Z" level=info msg="Processing workflow" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.176Z" level=info msg="Task-result reconciliation" namespace=tomasz numObjs=0 workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.176Z" level=info msg="All of node ppln-from-vsc-xkhhr.sample-op dependencies [] completed" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.177Z" level=info msg="Pod node ppln-from-vsc-xkhhr-1425665423 initialized Pending" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=warning msg="Non-transient error: failed to resolve {{`ppln-from-vsc-xkhhr`}}"
workflow-controller time="2024-08-27T12:38:03.178Z" level=error msg="Mark error node" error="failed to resolve {{`ppln-from-vsc-xkhhr`}}" namespace=tomasz nodeName=ppln-from-vsc-xkhhr.sampl
e-op workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr-1425665423 phase Pending -> Error" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr-1425665423 message: failed to resolve {{`ppln-from-vsc-xkhhr`}}" namespace=tomasz workflow=ppln-
from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr-1425665423 finished: 2024-08-27 12:38:03.178501556 +0000 UTC" namespace=tomasz workflow=ppln-fro
m-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=error msg="Mark error node" error="task 'ppln-from-vsc-xkhhr.sample-op' errored: failed to resolve {{`ppln-from-vsc-xkhhr`}}" names
pace=tomasz nodeName=ppln-from-vsc-xkhhr.sample-op workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr-1425665423 message: task 'ppln-from-vsc-xkhhr.sample-op' errored: failed to resolve {{`ppln-from
-vsc-xkhhr`}}" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Skipped node ppln-from-vsc-xkhhr-184939484 initialized Omitted (message: omitted: depends condition not met)" namespace=t
omasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Outbound nodes of ppln-from-vsc-xkhhr set to [ppln-from-vsc-xkhhr-184939484]" namespace=tomasz workflow=ppln-from-vsc-xkh
hr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr phase Running -> Error" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="node ppln-from-vsc-xkhhr finished: 2024-08-27 12:38:03.178728796 +0000 UTC" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Checking daemoned children of ppln-from-vsc-xkhhr" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="TaskSet Reconciliation" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg=reconcileAgentPod namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Updated phase Running -> Error" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Marking workflow completed" namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Checking daemoned children of " namespace=tomasz workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.178Z" level=info msg="Workflow to be dehydrated" Workflow Size=9953
workflow-controller time="2024-08-27T12:38:03.184Z" level=info msg="cleaning up pod" action=deletePod key=tomasz/ppln-from-vsc-xkhhr-1340600742-agent/deletePod
workflow-controller time="2024-08-27T12:38:03.188Z" level=info msg="Update workflows 200"
workflow-controller time="2024-08-27T12:38:03.189Z" level=info msg="Workflow update successful" namespace=tomasz phase=Error resourceVersion=724918529 workflow=ppln-from-vsc-xkhhr
workflow-controller time="2024-08-27T12:38:03.189Z" level=info msg="Queueing Error workflow tomasz/ppln-from-vsc-xkhhr for delete in 168h0m0s due to TTL"
workflow-controller time="2024-08-27T12:38:03.195Z" level=info msg="Delete pods 404"
workflow-controller time="2024-08-27T12:38:03.196Z" level=info msg="DeleteCollection workflowtaskresults 200"
workflow-controller time="2024-08-27T12:38:03.197Z" level=info msg="Patch events 200"
Environment
Steps to reproduce
Run below pipeline (kfp==1.8.21)
Expected result
Pipeline should run and succeed. This suddenly started happening, worked before.
Materials and Reference
Logs:
ml-pipeline
ml-pipeline-ui
Impacted by this bug? Give it a 👍.