totem / totem.github.io

Totem documentation, issues, guides.
http://totem.github.io
3 stars 1 forks source link

Alerting based on CoreOS Kernel events #32

Closed sukrit007 closed 9 years ago

sukrit007 commented 9 years ago

Currently, when the container exceeds the memory limit set by docker args, the kernel would kill the process however no notification is sent when such an event happens.

We need some strategy to notify on OOM Kill events. The event can be detected from systemd journal logs for kernel. " Eg: Memory cgroup out of memory: Kill process 7232 (java) score 1014 or sacrifice child"

sukrit007 commented 9 years ago

Options:

I can think of 3 options:

I am kind of inclined towards option 2 as it can easily be managed and configured using RESTFul API. The only downside is that it requires plugin installation.

sukrit007 commented 9 years ago

Currently we decided to use logstash hipchat plugin :

if [syslog_program] == 'kernel' and [short_message] =~ ".*Memory cgroup out of memory.*" {
    hipchat {
      room_id => "<room>"
      token => "<token>"
      color => "red"
      trigger_notify => true
    } 
  }