kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

Is it possible for an ExitHandler to retrieve details about the exiting run? #1900

Closed logicbomb421 closed 4 years ago

logicbomb421 commented 5 years ago

I'm currently writing a pipeline that takes a while to complete. I'd like to create an exit handler that sends a notification once done. I'd like this notification to include details specific to the run being exited (e.g. the URL to the UI, total run time, output from steps, etc). Additionally, I'd like to be able to run cleanup logic in an exit handler, which would potentially need to know about resources created within the handler's OpsGroup.

I can't seem to make this happen since an ExitHandler can have no dependencies. I'm wondering if I'm missing something here, as for all intents and purposes, it seems ExitHandlers are completely out-of-band of the executing pipeline?

Please let me know if clarification is needed.

Ark-kun commented 5 years ago

I can't seem to make this happen since an ExitHandler can have no dependencies. I'm wondering if I'm missing something here, as for all intents and purposes, it seems ExitHandlers are completely out-of-band of the executing pipeline?

That's the limitation of the underlying Argo orchestrator. Exit handler only gets the pipeline parameters and global variables (not supported by KFP).

gaoning777 commented 4 years ago

/unassign @gaoning777

Ark-kun commented 4 years ago

The ExitHandler can only use the pipeline arguments. You can pass it kfp.dsl.RUN_ID_PLACEHOLDER to get the unique run name that can be used to get information from the backend.

aaaaahaaaaa commented 4 years ago

@Ark-kun So there's no way to handle failure of a specific task? For example, if I want to do some garbage collection based on some values generated during the pipeline execution.

sinban04 commented 2 years ago

@Ark-kun So,,, there's still no way until now ? or any other plan or something?

sinban04 commented 2 years ago

Hi, guys i'm not sure you already have solution about this, but i wanna share mine.

As we can use Argo parameter in KFP such as {{workflow.status}} I found another variable {{workflow.failures}}, which enumerates the list of brief information about pods related to failure.

It shows me like below as an example

[{
"displayName":"pyspark3job",
"message":"Error (exit code 1)",
"templateName":"pyspark3job",
"phase":"Failed",
"podName":"featurestore-automation-4bm29-3640083072","finishedAt":"2022-10-17T09:05:39Z"
},{
"displayName":"pyspark3job(0)",
"message":"Error (exit code 1)",
"templateName":"pyspark3job",
"phase":"Failed",
"podName":"featurestore-automation-4bm29-4286919395","finishedAt":"2022-10-17T09:05:29Z"
},{
"displayName":"exit-handler-1(0)",
"message":"",
"templateName":"exit-handler-1","phase":"Failed",
"podName":"featurestore-automation-4bm29-504514750",
"finishedAt":"2022-10-17T09:05:39Z"
},{
"displayName":"featurestore-automation-4bm29",
"message":"",
"templateName":"featurestore-automation",
"phase":"Failed",
"podName":"featurestore-automation-4bm29","finishedAt":"2022-10-17T09:05:49Z"
},{
"displayName":"featurestore-automation-4bm29(0)",
"message":"",
"templateName":"featurestore-automation",
"phase":"Failed",
"podName":"featurestore-automation-4bm29-1109336338",
"finishedAt":"2022-10-17T09:05:44Z"
},{
"displayName":"exit-handler-1",
"message":"",
"templateName":"exit-handler-1",
"phase":"Failed",
"podName":"featurestore-automation-4bm29-1514408627","finishedAt":"2022-10-17T09:05:44Z"
}]

It could not give us enough debugging information, but at least it can give us some message of the pod with errors.

Related Issue

https://github.com/kubeflow/pipelines/issues/3322