Feature Idea - Monitor the synchronizer with something like StatsD

rmelick-vida commented 3 years ago

The admin dashboard, and the metrics apis available (like /admin/metrics) are interesting to help us investigate the current status synchronizer, but they are not very useful for monitoring the system over time. They don't have a historical view of things like queue size or lambdas, and it also requires you to remember to check that dashboard.

It would be good to have a way to pipe these metrics out of the synchronizer, and into a tool designed for monitoring and alerting, such as Datadog or Prometheus. We would like to monitor the queue size and trigger alerts based on queue size or bad lambda values.

I can think of a few good options for this, what do others think?

adding a StatsD client metric exporter to the synchronizer (like the one built by etsy or datadog). I know it's easy to pass statsd into Datadog, not sure about Prometheus.
Creating a custom Datadog Integration (which would probably call the existing admin REST api and convert it to datadog metrics. Downside is it requires separate maintenance and is datadog specific, but might provide a better experience for customers using datadog like us.

chillaq commented 3 years ago

Hi Russel, You can use /admin/events/queueSize & /admin/impressions/queueSize, if the size is growing then the synchronizer is not catching up to the incoming impressions/events. This is where the lambda value is calculated in admin dashboard. You can send this value to Datadog.

Thanks Bilal

rmelick-vida commented 3 years ago

Hi @chillaq I know about those APIs, I just don't want to write a custom script or process to pull from those APIs and then send to datadog.

splitio / split-synchronizer

Feature Idea - Monitor the synchronizer with something like StatsD #149