Monitoring/metrics/instrumentation

dmacvicar commented 6 years ago

Right now there are two use-cases where we need some kind of monitoring and metric tracking:

That the application is up at all
That if we have a problem: eg. the current memory consumption, OBS being unresponsive, etc., we have the right data and evidence to make conclusions about it.
Some statistics that before were thrown into the database (download counter, etc).

Therefore, I suggest we look into enabling the application to be scrapped by prometheus, which is a popular solution nowadays, and easily integrated then with graphana or other dashboards.

This means enabling a /metrics endpoint in the application. Initially we could use one of our internal prometheus installations.

https://prometheus.io/docs/prometheus/latest/getting_started/

For rails apps, enabling it could be as simple as using the Rack middleware:

# This file is used by Rack-based servers to start the application.
require ::File.expand_path('../config/environment', __FILE__)
require 'rack'
require 'prometheus/middleware/collector'
require 'prometheus/middleware/exporter'

use Rack::Deflater, if: ->(_, _, _, body) { body.any? && body[0].length > 512 }
use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter
run SoftwareOO::Application

However there are some showstoppers when using puma/multi-process servers than need to be investigated, as not all client implementations store the metric correctly in these situations, and there may be alternative solutions for these cases.

hennevogel commented 6 years ago

What about https://metrics.opensuse.org/ ? :-)

dmacvicar commented 6 years ago

That would be perfect. We would still need a prometheus instance to gather the metrics. We can use metrics.opensuse.org to display them.

hennevogel commented 6 years ago

We would still need a prometheus instance to gather the metrics.

If we only want Rails middle ware stats there is also influxdb-rails.

For the other data we can send things out to rabbit.opensuse.org, consume with telegraf, write to influxdb (make it possible for others to use this data from script or whatnot). Or sending things out with influxdb-ruby.

hustodemon commented 6 years ago

Short status update: I did some experiments with the Prometheus Exporter. I was able to export basic ruby metrics and visualize them with grafana. I still want to explore the influx options suggested by @hennevogel .

dmacvicar commented 6 years ago

@hennevogel Do we have an openSUSE instance of InfluxDB already? or do you mean running one in the same machine? (in that case it would not make a difference to use Prometheus).

hennevogel commented 6 years ago

@dmacvicar rabbitmq runs on rabbit.o.o. and metrics.o.o runs telegraf, influx and grafana already.

dmacvicar commented 6 years ago

@hustodemon we could ask @jberry-suse if we can use the InfluxDB in metrics.opensuse.org, or whether we can run prometheus there. https://bitworking.org/news/2017/03/prometheus

hennevogel commented 6 years ago

I'm sure you can. We (OBS team) will also start to use it soon :-)

dmacvicar commented 6 years ago

I would really prefer to go the Prometheus way (pull), and also because of the internal knowledge we have inside of SUSE (used for SUSE Manager, Storage, Containers)