linkedin / Burrow

Kafka Consumer Lag Checking
Apache License 2.0
3.75k stars 798 forks source link

Allow notifiers to receive metrics regardless of consumer group status #341

Open gustavosoares opened 6 years ago

gustavosoares commented 6 years ago

We currently have a working fork where we implemented two additional notifiers: one to push metrics to AWS CloudWatch [1] and another one to push metrics to dogstatsd 2.

Long story short, here at Vend we have a containerised driven architecture leveraged on AWS ECS and we wanted a way to be able to autoscale our consumers based on consumer lag. AWS Application autoscaling does not work well with sparse metric, which means that we need to push every single metric to cloudwatch regardless of the consumer status. As of [2], we push all metrics to dogstatsd so we can plot nice consumer lag graphs and also to create some monitors that may eventually trigger pagerduty if a consumer lag goes over a defined threshold.

Having said that ^^, at the moment there is no elegant way to enable ShowAll per notifier. Since we don't care about any other notifier we simply added ShowAll: True to https://github.com/linkedin/Burrow/blob/master/core/internal/notifier/coordinator.go#L375

                    go func(sendCluster string, sendConsumer string) {
                        nc.App.EvaluatorChannel <- &protocol.EvaluatorRequest{
                            Reply:   nc.evaluatorResponse,
                            Cluster: sendCluster,
                            Group:   sendConsumer,
                            ShowAll: true,
                        }
                    }(cluster, consumer)

The aforementioned approach is not ideal should we wish to push those two custom notifiers to upstream and on top of that it makes merging off upstream a bit cumbersome sometimes.

Please advise on what the best approach would be to address the above scenario.

StevenACoffman commented 6 years ago

In Kubernetes, using the Custom Metrics Adapter for Prometheus means we can horizontally autoscale consumers on arbitrary metrics that we already collect with Prometheus, such as consumer lag using Burrow. I'm not sure if anyone has already done this, but I'm looking for examples, and this is the closest mention I've been able to find.

daodennis-zz commented 6 years ago

Yeah, I'd like to instrument the app itself with Prometheus and it would overlap with the http API likely, but also provide operational metrics.

On Feb 8, 2018 6:23 PM, "Steve Coffman" notifications@github.com wrote:

In Kubernetes, using the Custom Metrics Adapter for Prometheus means we can horizontally autoscale consumers on arbitrary metrics that we already collect with Prometheus, such as consumer lag using Burrow. I'm not sure if anyone has already done this, but I'm looking for examples, and this is the closest mention I've been able to find.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/linkedin/Burrow/issues/341#issuecomment-364315550, or mute the thread https://github.com/notifications/unsubscribe-auth/AKgsUHQzYz7erswgc5YeQ1LEVqBtF-QKks5tS6wagaJpZM4RzGAl .

nickdevp commented 6 years ago

This seems like a bug instead of an enhancement. Notifier configuration has a 'threshold' property, which already indicates if StatusOK messages should be sent out. Shouldn't the threshold property be used in the evaluation request 'ShowAll' flag? IE: ShowAll = (threshold == StatusOK)