ndelitski / rancher-alarms

Will kick your ass if found unhealthy service in Rancher environment
85 stars 20 forks source link

Not picking up new or flapping services #37

Closed bluemalkin closed 6 years ago

bluemalkin commented 6 years ago

Hi,

I'm having troubles getting the rancher-alarms to work despite following the example compose.

Issue 1: After creating the Rancher Alarm service, if I create a new service/stack/container then restart the Rancher Alarm service, the newly created service is not picked up in the logs. All other existing services are. Could this require use of the ALARM_POLL_INTERVAL and/or ALARM_MONITOR_INTERVAL settings ? If so please clarify usage.

Issue 2: I have a stack with many services becoming degraded couple of seconds then it's healthy. Rancher Alarms does not pick this up.

flaccid commented 6 years ago

It is strange because I create a new stack and service, say alpine/alpine and then restart rancher-alarms container and it simply does not pick up this new healthy service @ndelitski.

ndelitski commented 6 years ago

ALARM_POLL_INTERVAL is used for fetching services list, so by default alarms service handle new and removed services after 60 sec ALARM_MONITOR_INTERVAL - only used for healthchecks, every 15s by default alarms checks for every service state (degraded or active) in its internal service list. Could you try to play with this timings? Have to do more clear naming btw...

flaccid commented 6 years ago

@tombmurphy let's play with this next week. The polling may be missing discovery due to defaults.

flaccid commented 6 years ago

Testing with 0.1.8-rc seems to have possibly addressed these issues. @bluemalkin see if you can do a coupld of manual tests to verify. At least on startup, rancher-alarms seems to be properly detecting all services.

flaccid commented 6 years ago

Closing this one out, it does seem resolved. @bluemalkin did you want to close this one out now as resolved?