Log/push consul service events by consul server leader

Feature Description

Current consul leader role holder should be able to log or push service events. At least such fields might be required:

service_id
healthcheck_id
status (crtitical/passing/warning/maintenance)
health_check_output (or the reason for maintenance if it is enabled)
tags

in case node health check is critical/passing/etc. - we need to log as much lines as much services we have on the node.

Another words - all events that affects some services must be logged or sent to the configured endpoint (if enabled) including mandatory field - service_id Log file format should be JSON.

Use Case(s)

We are using consul for almost two years now for multiple projects in multiple environments and one of the main challenges is to configure alerting and/or events in external system. Basically we have next use case: Grafana dashboard shows basic traffic metrics like requests per second, response times, number of errors etc. filtered by consul service name (grafana templates) while consul events are all annotations applied to this dashboard - this way we are able to see direct impact of the consul service issue to the overall application performance and reliability without necessity to look into application logs.

With all the respect to the consul events subscription system power and usability I need to say that using it is hard for such simple usecase like the above. All alerting tools the are present now has their own bugs, issues and require some expertise to use and support. I personally tried next tools:

consul-alerts (too complicated, require at least 3 instances, some issues with consul service maintenance mode)
prometheus consul exporter (doesn't use blocking query? can loose events? - no health check outputs)
telegraf consul input (doesn't use blocking query? can loose events? - no health check outputs)
custom python script (Blocking Queries)
custom GoLang script
consul-template + some script
consul watch + script

All of the above needs support, needs time to implement and troubleshoot, needs to be deployed somehow. For example my python script utilizes 100% CPU time of one core (I'm a bad developer :) ) and so on and so forth.

So once again - things should not be so complicated in the consul world, at least not for such simple case as alerting and events logging. I think that as consul leader knows all events all the time - it might be the best place to log and/or push events somewhere (HTTP POST with JSON payload). If leader role is transfered to another server - it must start to work on this task. So this way if I'm collecting logs from all consul servers (and I think all others do the same) I will just get what I need in ELK or Splunk or whatever.

Thanks, Roman

hashicorp / consul

Log/push consul service events by consul server leader #4431

Feature Description

Use Case(s)