openSUSE / software-o-o

The site behind https://software.opensuse.org. It is the default web interface to download openSUSE distributions and to search for OBS packages. Packaged at https://build.opensuse.org/project/show/openSUSE:infrastructure:software.opensuse.org
https://software.opensuse.org/
GNU General Public License v2.0
139 stars 111 forks source link

Monitoring/metrics/instrumentation #230

Open dmacvicar opened 6 years ago

dmacvicar commented 6 years ago

Right now there are two use-cases where we need some kind of monitoring and metric tracking:

Therefore, I suggest we look into enabling the application to be scrapped by prometheus, which is a popular solution nowadays, and easily integrated then with graphana or other dashboards.

This means enabling a /metrics endpoint in the application. Initially we could use one of our internal prometheus installations.

https://prometheus.io/docs/prometheus/latest/getting_started/

For rails apps, enabling it could be as simple as using the Rack middleware:

# This file is used by Rack-based servers to start the application.
require ::File.expand_path('../config/environment', __FILE__)
require 'rack'
require 'prometheus/middleware/collector'
require 'prometheus/middleware/exporter'

use Rack::Deflater, if: ->(_, _, _, body) { body.any? && body[0].length > 512 }
use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter
run SoftwareOO::Application

However there are some showstoppers when using puma/multi-process servers than need to be investigated, as not all client implementations store the metric correctly in these situations, and there may be alternative solutions for these cases.

hennevogel commented 6 years ago

What about https://metrics.opensuse.org/ ? :-)

dmacvicar commented 6 years ago

That would be perfect. We would still need a prometheus instance to gather the metrics. We can use metrics.opensuse.org to display them.

hennevogel commented 6 years ago

We would still need a prometheus instance to gather the metrics.

If we only want Rails middle ware stats there is also influxdb-rails.

For the other data we can send things out to rabbit.opensuse.org, consume with telegraf, write to influxdb (make it possible for others to use this data from script or whatnot). Or sending things out with influxdb-ruby.

hustodemon commented 6 years ago

Short status update: I did some experiments with the Prometheus Exporter. I was able to export basic ruby metrics and visualize them with grafana. I still want to explore the influx options suggested by @hennevogel .

dmacvicar commented 6 years ago

@hennevogel Do we have an openSUSE instance of InfluxDB already? or do you mean running one in the same machine? (in that case it would not make a difference to use Prometheus).

hennevogel commented 6 years ago

@dmacvicar rabbitmq runs on rabbit.o.o. and metrics.o.o runs telegraf, influx and grafana already.

dmacvicar commented 6 years ago

@hustodemon we could ask @jberry-suse if we can use the InfluxDB in metrics.opensuse.org, or whether we can run prometheus there. https://bitworking.org/news/2017/03/prometheus

hennevogel commented 6 years ago

I'm sure you can. We (OBS team) will also start to use it soon :-)

dmacvicar commented 6 years ago

I would really prefer to go the Prometheus way (pull), and also because of the internal knowledge we have inside of SUSE (used for SUSE Manager, Storage, Containers)

jberry-suse commented 6 years ago

Prometheus does not bother me. As far as pull, that's how influxdb is getting the data right now.

Presumably not talking about major resource usage as any increases will need to be requested. The plan is to manage via salt, but nothing ever came of previous meetings to achieve that. If folks have interest in converting the configuration that would be great. Otherwise, if you provide the necessary config I can install on the machine or potentially grant someone access, but that can get messy with too many chefs in kitchen.

jberry-suse commented 6 years ago

The tooling for pulling data and the grafana dashboard and data source definitions are providing via a package on OBS which would be ideal for software-o-o to do as well. That way all that is outside of proper versioning (somewhere) is the firewall config, grafana/influxdb config, and list of packages installed on machine.

jberry-suse commented 6 years ago

For an example of the package layout that uses grafana provisioning see https://github.com/openSUSE/openSUSE-release-tools/blob/820d1030e54f5c9bbfe9aeb69ca5b3b44a838aaa/dist/package/openSUSE-release-tools.spec#L445-L461.

hustodemon commented 6 years ago

Hi @jberry-suse , I wrote a simple salt state that installs prometheus on a machine and makes sure it's running. I don't have much experience with packaging, but IIUC you'd prefer creating some kind of pseudopackage which contains some prometheus config and which makes sure the prometheus is installed (via Requires). Is that right?

jberry-suse commented 6 years ago

Salt is fine. The packaging is for configs/scripts coming out of this repo, but if very minor like point at your endpoint could also be done via salt.

hustodemon commented 6 years ago

status update: I pinged the openSUSE Heroes about creating a new machine for us, let's see how this turns out.

jberry-suse commented 6 years ago

Different from metrics.o.o?

jberry-suse commented 6 years ago

Based on emails I thought were going to add to openSUSE salt master and have it thus installed on metrics.o.o.

jberry-suse commented 6 years ago

If you need ssh access to debug and get things working I can provide if you let me know what pubkey to use.

hustodemon commented 6 years ago

I see. We don't really care where Prometheus is going to be installed, metrics.o.o would be also fine. I'll update my ticket, then.