Prometheus monitoring support

SuperQ commented 5 years ago

I'd like to add Prometheus monitoring support to this code to make it easier for FOSDEM to monitor our video fleet.

This would involve adding client_python to things like voctocore.

The basic client library will give us process metrics like CPU use, memory, open FDs.

It will also open up the option to add custom.

derpeter commented 5 years ago

Could you explain where the benefit of including this into voctomix in favor of running a monitoring client in parallel is? As you probably want to monitor the whole system and not only the vocotomix process i don't see an advantage here. Also you would need to keep the prometheus client code up to date after putting it into the vocotomix code.

SuperQ commented 5 years ago

What do you mean by "whole system"?

With Prometheus monitoring, we already have extensive host monitoring with the node_exporter. We take a granular approach to keep things lightweight.

Having direct code instrumentation has a bunch of advantages, the process can track internal errors and other metrics directly. The goal here is to be able to alert on problems with the process itself, not the host.

For example, we have an audio stream monitor, audio-fetcher that produces useful dashboards of audio level monitoring.

Another advantage of having internal instrumentation is that any 3rd party monitoring software can access this data. Basically every monitoring system can use the Prometheus metrics API to extract process internal data.

The client code is a installed as an included dependency, either via pip or apt. Not much to worry about there.

fightling commented 5 years ago

Hey @SuperQ,

thx for your report from FOSDEM!

Imo monitoring things is an important requirement to voctomix. Mostly because voctomix setups differ a lot and problems can happen at several points of different systems.

I'm currently working on voctomix2 at a branch named "feature/onepipe" and added some monitoring tools recently to look at mix-pipeline parameters like TCP/IP ports and performance data like queue fill levels. Just to show the user some health information and which plugs are available with the running configuration.

I'm currently not sure, if Prometheus is the client we shall use for remote monitoring - I just don't know enough about it. But @derpeter is right: Monitoring CPU, memory and I/O usage of voctomix can be done easier outside of the voctomix code on the system it is running at. But monitoring the internal parameters of voctomix' components to supervise the mixer is an important thing to me.

So I'm currently focusing on configuration and transparency of the new voctomix2. Actually I still show the measured data within the voctogui by using our proprietary protocols and some GTK treeview widgets:

voctomix2-monitoring

A standardized monitoring interface would be a thing I'm looking forward to.

btw: is SNMP still existing? ;)

RichiH commented 5 years ago

Unless you want to delve into ASN.1 etc, just don't do SNMP.

https://openmetrics.io/ looks to replace SNMP (I am serious about that) and by using Prometheus Python library, you get that for free. It's not just Prometheus. InfluxDB, Datadog, Stackdriver, etc also support the Prometheus exposition format, and committed to OpenMetrics as well.

FWIW, C3NOC, C3POC, and (iirc) C3VOC are already using Prometheus. As to what VOC does with it, Lukas Hampe can answer that, but I don't know his GH handle.

SuperQ commented 5 years ago

The Prometheus client stuff I'm proposing inside voctomix is very specific. We just expose a couple of simple per-process metrics as "free" since the python client can easily gather some stuff from /proc/self/.... This is simple stuff like process_cpu_seconds_total, process_resident_memory_bytes, process_start_time_seconds. Things you might want to do basic "is the process up and running".

Once you have the library, you can then start adding all the custom stuff you want.

Yes, you can also get this stuff from outside, from a supervisor like systemd, or Docker. But Prometheus simplifies the basics by not needing any of that.

IMO, Prometheus format is the replacement for SNMP. We're actively working on an RFC to propose the next iteration of the Prometheus format as formal standard. :grin:

Really, I don't care if you use Prometheus to monitor or not. Having the metrics so it could be monitored by Prometheus, InfluxDB, whatever is what I'm looking for.

derpeter commented 5 years ago

What do you mean by "whole system"?

With Prometheus monitoring, we already have extensive host monitoring with the node_exporter. We take a granular approach to keep things lightweight.

Having direct code instrumentation has a bunch of advantages, the process can track internal errors and other metrics directly. The goal here is to be able to alert on problems with the process itself, not the host.

For example, we have an audio stream monitor, audio-fetcher that produces useful dashboards of audio level monitoring.

Another advantage of having internal instrumentation is that any 3rd party monitoring software can access this data. Basically every monitoring system can use the Prometheus metrics API to extract process internal data.

The client code is a installed as an included dependency, either via pip or apt. Not much to worry about there.

Ok as you mentioned CPU Load and open FDs in the inital post i was not aware you where aiming on instrumentation.

Regarding audio fetcher, you might want to have a look at https://github.com/voc/multiview-monitor

derpeter commented 5 years ago

The Prometheus client stuff I'm proposing inside voctomix is very specific. We just expose a couple of simple per-process metrics as "free" since the python client can easily gather some stuff from /proc/self/.... This is simple stuff like process_cpu_seconds_total, process_resident_memory_bytes, process_start_time_seconds. Things you might want to do basic "is the process up and running".

Once you have the library, you can then start adding all the custom stuff you want.

Yes, you can also get this stuff from outside, from a supervisor like systemd, or Docker. But Prometheus simplifies the basics by not needing any of that.

IMO, Prometheus format is the replacement for SNMP. We're actively working on an RFC to propose the next iteration of the Prometheus format as formal standard.

Really, I don't care if you use Prometheus to monitor or not. Having the metrics so it could be monitored by Prometheus, InfluxDB, whatever is what I'm looking for.

I didn't want to argue weather prometheus or not. monitoring the process it self is usually better done from the outside. You don't need any supervisor for that, just run a client on plain simple debian. We use Prometheus for a about 3 years at c3voc. We already monitor there the state of many parameters of the vocotocore machines.

I agree with @fightling that e.g. buffer states and other internal values could be a nice extension of outside monitoring

MaZderMind commented 5 years ago

I'M with @derpeter on this. I would suggest to not implement a specific monitoring Client into Voctomix. It is a videomixer and should be good at videomixing, other tools are better at other things.

I would suggest to add an API-Call to receive the monitorable parameters from Voctomix and have a Prometheus/Munin/CollectD/Telegraf-Plugin that talks to the API, regularly requesting the interesting metrics.

All OS-Level metrics (FDs, Memory, CPU, …) are available from the respective OS-Clients anyway and should not be re-implemented in voctomix.

SuperQ commented 5 years ago

@MaZderMind Prometheus client_python supports OpenMetrics, which is an API supported by all of the listed monitoring systems.

I'm not suggesting adding OS level metrics, I'm suggesting adding process level metrics.

voc / voctomix

Prometheus monitoring support #243