ndelitski / rancher-alarms

Will kick your ass if found unhealthy service in Rancher environment
85 stars 20 forks source link

False positives #30

Open iDiogenes opened 7 years ago

iDiogenes commented 7 years ago

Hello,

I pointed this at my rancher 1.4.3 server and it said about half my stacks were in an UNHEALTHY state and fired off a bunch of emails. However, the rancher UI says everything is green.

Is 1.4 supported?

VAdamec commented 7 years ago

I tried clean v1.4.3 and some sample apps from catalog and it seems to get state changes without any problem. Can you provide logs from container ? we can add some debug to see more later.

My sample log:

[INFO]   2017-4-11 5:48:22:643     start polling rancher-eventer/rancher-eventer
[INFO]   2017-4-11 5:48:22:644     start polling pxc/pxc
[INFO]   2017-4-11 5:48:22:644     start polling concrete5/cmsmysql
[INFO]   2017-4-11 5:48:22:645     start polling concrete5/concrete5app
[INFO]   2017-4-11 5:48:22:645     start polling dokuwiki2/dokuwiki-server
[INFO]   2017-4-11 6:4:9:301       service concrete5/cmsmysql active -> degraded
[INFO]   2017-4-11 6:4:24:325      service concrete5/cmsmysql degraded -> active
[INFO]   2017-4-11 6:5:24:460      service concrete5/concrete5app active -> upgraded
[INFO]   2017-4-11 6:5:39:492      service concrete5/concrete5app upgraded -> active
[INFO]   2017-4-11 6:5:54:526      service concrete5/concrete5app active -> degraded
[INFO]   2017-4-11 6:6:9:549       service concrete5/concrete5app degraded -> active
[INFO]   2017-4-11 6:6:23:287      stopping pxc due to rolling-back state
[INFO]   2017-4-11 6:6:23:287      stop polling pxc/pxc
[INFO]   2017-4-11 6:6:24:625      service pxc/pxc         upgrading -> degraded
[INFO]   2017-4-11 6:7:23:321      discovered new running service, creating monitor for: pxc/pxc
[INFO]   2017-4-11 6:7:23:322      new monitor up pxc/pxc:
  targets: "(HipchatTarget {\"notify\":\"true\"})"
  healthcheck: {
    "pollInterval": 15000,
    "healthyThreshold": 3,
    "unhealthyThreshold": 4
}

[INFO]   2017-4-11 6:7:23:322      start polling pxc/pxc
[INFO]   2017-4-11 6:7:38:364      service pxc/pxc         active -> degraded
[INFO]   2017-4-11 6:8:23:475      service pxc/pxc         became UNHEALTHY with threshold 4
[INFO]   2017-4-11 6:8:24:64       sent event to Hipchat service pxc in stack pxc became degraded (active) link: http://xxx.xxx.xxx.xxx:8080/env/1a5/apps/stacks/1e7/services/1s8/containers
iDiogenes commented 7 years ago

I am not sure if you initiated an issue, but it looks like your service pxc became UNHEALTHY after rancher-alarms started.

Here are my logs. As you can see 6 of the 10 services were marked as degraded and then UNHEALTH pretty much right on startup. Thre resulted in 6 emails being triggered. However, Rancher is showing every service as active/green.

4/11/2017 10:42:51 AM> rancher-alarms@0.1.7 start /usr/src/app 4/11/2017 10:42:51 AM> node bin/rancher-alarms.js 4/11/2017 10:42:51 AM 4/11/2017 10:42:55 AM[INFO] 2017-4-11 17:42:55:112 composing config from env variables 4/11/2017 10:42:55 AM[INFO] 2017-4-11 17:42:55:125 started with config: 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:709 monitors inited: 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:710 mystack3/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:710 mystack3/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack3/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 letsencrypt/letsencrypt: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 pa/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 pa/shoryuken: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 edge/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 edge/redirect: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack5/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack5/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack5/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack1/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack1/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack4/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack4/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 api/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 api/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 api/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack6/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack6/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack2/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack2/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack2/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: myuser@mydomain.com" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:714 start polling mystack3/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:718 start polling mystack3/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:719 start polling mystack3/lb 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:719 start polling letsencrypt/letsencrypt 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:719 start polling pa/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:719 start polling pa/shoryuken 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:720 start polling edge/lb 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:720 start polling edge/redirect 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack5/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack5/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack5/lb 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack1/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack1/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack4/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack4/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling api/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling api/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling api/lb 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack6/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack6/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack2/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack2/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack2/lb 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:420 service mystack4/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:436 service mystack1/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:442 service mystack6/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:502 service mystack3/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:511 service mystack5/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:532 service mystack2/passenger active -> degraded 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:694 service mystack4/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:856 service mystack1/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:899 service mystack3/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:906 service mystack5/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:909 service mystack6/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:963 service mystack2/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:214 sending email notification to myuser@mydomain.com 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:384 sending email notification to myuser@mydomain.com 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:406 sending email notification to myuser@mydomain.com 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:419 sending email notification to myuser@mydomain.com 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:463 sending email notification to myuser@mydomain.com 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:485 sending email notification to myuser@mydomain.com 4/11/2017 10:43:45 AM[INFO] 2017-4-11 17:43:45:484 sent email notification to myuser@mydomain.com { 4/11/2017 10:43:45 AM "accepted": [ 4/11/2017 10:43:45 AM "myuser@mydomain.com" 4/11/2017 10:43:45 AM ], 4/11/2017 10:43:45 AM "rejected": [], 4/11/2017 10:43:45 AM "response": "250 2.0.0 OK 1491932625 n7sm31840855pfn.0 - gsmtp", 4/11/2017 10:43:45 AM "envelope": { 4/11/2017 10:43:45 AM "from": "myuser@mydomain.com", 4/11/2017 10:43:45 AM "to": [ 4/11/2017 10:43:45 AM "myuser@mydomain.com" 4/11/2017 10:43:45 AM ] 4/11/2017 10:43:45 AM }, 4/11/2017 10:43:45 AM "messageId": "1491932624641-8e3b5400-57777895-4643747f@mydomain.com" 4/11/2017 10:43:45 AM} 4/11/2017 10:43:45 AM[INFO] 2017-4-11 17:43:45:746 sent email notification to myuser@mydomain.com { 4/11/2017 10:43:45 AM "accepted": [ 4/11/2017 10:43:45 AM "myuser@mydomain.com" 4/11/2017 10:43:45 AM ], 4/11/2017 10:43:45 AM "rejected": [], 4/11/2017 10:43:45 AM "response": "250 2.0.0 OK 1491932625 t5sm31763246pgb.58 - gsmtp", 4/11/2017 10:43:45 AM "envelope": { 4/11/2017 10:43:45 AM "from": "myuser@mydomain.com", 4/11/2017 10:43:45 AM "to": [ 4/11/2017 10:43:45 AM "myuser@mydomain.com" 4/11/2017 10:43:45 AM ] 4/11/2017 10:43:45 AM }, 4/11/2017 10:43:45 AM "messageId": "1491932624502-9f9b38ef-fc021404-00c495c2@mydomain.com" 4/11/2017 10:43:45 AM} 4/11/2017 10:43:46 AM[INFO] 2017-4-11 17:43:46:77 sent email notification to myuser@mydomain.com { 4/11/2017 10:43:46 AM "accepted": [ 4/11/2017 10:43:46 AM "myuser@mydomain.com" 4/11/2017 10:43:46 AM ], 4/11/2017 10:43:46 AM "rejected": [], 4/11/2017 10:43:46 AM "response": "250 2.0.0 OK 1491932626 r17sm31801969pfa.13 - gsmtp", 4/11/2017 10:43:46 AM "envelope": { 4/11/2017 10:43:46 AM "from": "myuser@mydomain.com", 4/11/2017 10:43:46 AM "to": [ 4/11/2017 10:43:46 AM "myuser@mydomain.com" 4/11/2017 10:43:46 AM ] 4/11/2017 10:43:46 AM }, 4/11/2017 10:43:46 AM "messageId": "1491932624634-318d97dc-d8cc5db6-fd492550@mydomain.com" 4/11/2017 10:43:46 AM} 4/11/2017 10:43:46 AM[INFO] 2017-4-11 17:43:46:426 sent email notification to myuser@mydomain.com { 4/11/2017 10:43:46 AM "accepted": [ 4/11/2017 10:43:46 AM "myuser@mydomain.com" 4/11/2017 10:43:46 AM ], 4/11/2017 10:43:46 AM "rejected": [], 4/11/2017 10:43:46 AM "response": "250 2.0.0 OK 1491932626 o194sm31854886pfg.66 - gsmtp", 4/11/2017 10:43:46 AM "envelope": { 4/11/2017 10:43:46 AM "from": "myuser@mydomain.com", 4/11/2017 10:43:46 AM "to": [ 4/11/2017 10:43:46 AM "myuser@mydomain.com" 4/11/2017 10:43:46 AM ] 4/11/2017 10:43:46 AM }, 4/11/2017 10:43:46 AM "messageId": "1491932624637-070282ad-d48bfe01-57a7081f@mydomain.com" 4/11/2017 10:43:46 AM} 4/11/2017 10:43:46 AM[INFO] 2017-4-11 17:43:46:738 sent email notification to myuser@mydomain.com { 4/11/2017 10:43:46 AM "accepted": [ 4/11/2017 10:43:46 AM "myuser@mydomain.com" 4/11/2017 10:43:46 AM ], 4/11/2017 10:43:46 AM "rejected": [], 4/11/2017 10:43:46 AM "response": "250 2.0.0 OK 1491932626 m19sm5561930pfg.115 - gsmtp", 4/11/2017 10:43:46 AM "envelope": { 4/11/2017 10:43:46 AM "from": "myuser@mydomain.com", 4/11/2017 10:43:46 AM "to": [ 4/11/2017 10:43:46 AM "myuser@mydomain.com" 4/11/2017 10:43:46 AM ] 4/11/2017 10:43:46 AM }, 4/11/2017 10:43:46 AM "messageId": "1491932624694-9ada7c1c-3900e933-a89a9bed@mydomain.com" 4/11/2017 10:43:46 AM} 4/11/2017 10:43:47 AM[INFO] 2017-4-11 17:43:47:66 sent email notification to myuser@mydomain.com { 4/11/2017 10:43:47 AM "accepted": [ 4/11/2017 10:43:47 AM "myuser@mydomain.com" 4/11/2017 10:43:47 AM ], 4/11/2017 10:43:47 AM "rejected": [], 4/11/2017 10:43:47 AM "response": "250 2.0.0 OK 1491932627 2sm8215793pfs.85 - gsmtp", 4/11/2017 10:43:47 AM "envelope": { 4/11/2017 10:43:47 AM "from": "myuser@mydomain.com", 4/11/2017 10:43:47 AM "to": [ 4/11/2017 10:43:47 AM "myuser@mydomain.com" 4/11/2017 10:43:47 AM ] 4/11/2017 10:43:47 AM }, 4/11/2017 10:43:47 AM "messageId": "1491932624699-9ddac7c1-9e0ed61e-e97122fc@mydomain.com" 4/11/2017 10:43:47 AM}

VAdamec commented 7 years ago

Well GREEN service in Rancher UI doesn't mean it's healthy, depends on how you setup healtchecks in affected services. If you look to API services are they really healthy ?

iDiogenes commented 7 years ago

Using the rancher cli running a ps against the environment shows every service and sidekick as healthy. What is rancher-alarms querying to check for a healthy state?

iDiogenes commented 7 years ago

Also, using the "view in API" from the UI is showing the same results - active and healthy.

VAdamec commented 7 years ago

That's strange, it get result from API (services, see server.es6 and rancher.es6), do you have more environments ?

iDiogenes commented 7 years ago

There is only a single cattle environment on the server that is being queried.

VAdamec commented 7 years ago

Ok, so please run it with debug, I need to see more than standard log

VAdamec commented 7 years ago

I'm not familiar with trace() which is used here, but you can easily change it to info() in src/server.es6, line 32

  trace(`loaded services from API\n${JSON.stringify(services, null, 4)}`)
# just change trace to info
  info(`loaded services from API\n${JSON.stringify(services, null, 4)}`)

it will show you complete API response which is received from Rancher. And run it from shell:

export RANCHER_ACCESS_KEY=..
...
npm start
iDiogenes commented 7 years ago

@VAdamec - I found the issue and it does have to do with the version of Rancher. The _withoutSidekicks function does a split on the container name using an underscore. In Rancher 1.2 (I believe, could be 1.3) they changed the sidekicks to be separated by a hyphen. I updated my code locally to use the hyphen and it solved my problem. Not sure how you want to address the issue, but a fix that supports both formats would be recommended.

https://github.com/ndelitski/rancher-alarms/blob/master/src/monitor.es6#L195

VAdamec commented 7 years ago

Ok, it's seems to be easy fix, do you create PR ? we see If and when @ndelitski accept it