treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.31k stars 222 forks source link

[feature request]Global Error handling ${error.name} #669

Open toyama0919 opened 7 years ago

toyama0919 commented 7 years ago

We define only one error handling. Details are confirmed on the monitoring side, but I want to know at least where error occurred.

Can you do for the following? It makes me tired of defining errors by every task.

_error:
  sh>: echo error => ${error.name} # echo error => +task1

+task1:
  sh>: not_exist_command
frsyuki commented 7 years ago

_error is not reliable enough for notification by its design. I don't want to recommend you to use _error for notification purpose. Instead, digdag has a configuration named notification.type = for reliable error notification. It can receive task name.

Notification mechanism is not documented yet. Code is here: https://github.com/treasure-data/digdag/tree/master/digdag-core/src/main/java/io/digdag/core/notification Configuration is probably either of these:

notification.type = shell
notification.shell.command = ...
# this command receives JSON structure of io.digdag.spi.Notification through STDIN
notification.type = http
notification.http.method = ...
notification.http.url = ...
notification.http.headers.... = ...
notification.type = mail
notification.mail.from = ...
notification.mail.to = ...
notification.mail.cc = ...
notification.mail.bcc = ...
notification.mail.subject = ...
notification.mail.body_template_file = ...
notification.mail.html = ...

body_template_file can contain ${...} to format email body. Default is

Digdag Notification

Message: ${message}
Date: ${timestamp}

Site Id: ${site_id}
Project Name: ${project_name}
Project Id: ${project_id}
Workflow Name: ${workflow_name}
Revision: ${revision}
Attempt Id: ${attempt_id}
Session Id: ${session_id}
Task Name: ${task_name}
Time Zone: ${timezone}
Session Uuid: ${session_uuid}
Session Time: ${session_time}
toyama0919 commented 7 years ago

Thanks. I tried it.

workflow.dig(command not found: ech)

+a:
  sh>: ech hoge

~/.config/digdag/config

notification.type = shell
notification.shell.command = notification.rb

ruby code

#! /usr/bin/env ruby
# coding: UTF-8

unless STDIN.tty?
  File.write("error.log", STDIN.read)
end

error.log

{
  "timestamp": "2017-10-13T00:33:32Z",
  "message": "Workflow session attempt failed",
  "site_id": 0,
  "project_id": 1,
  "project_name": "default",
  "workflow_name": "workflow",
  "revision": "2017-10-13T00:33:30.093Z",
  "attempt_id": 1,
  "session_id": 1,
  "task_name": "+workflow^failure-alert",
  "timezone": "UTC",
  "session_uuid": "08254924-bf80-4bb3-946a-7179b7b038d7",
  "session_time": "2017-09-19T00:00:00+00:00"
}

This my case, I want +workflow+a by ${task_name}.(not +workflow^failure-alert) Is there any way?

frsyuki commented 7 years ago

That's bad... Only way is to improve the code.