mesos / mesos_exporter

Prometheus Mesos Exporter
Apache License 2.0
104 stars 62 forks source link

Prototype for adding Prometheus service discovery #71

Open maurorappa opened 6 years ago

maurorappa commented 6 years ago

I think it could be useful for others having a Prometheus Service Discovery endpoint. The idea behind is having an output ready to be used for Prometheus (https://www.robustperception.io/using-json-file-service-discovery-with-prometheus/); in such a way if the cluster has dynamic members joining, you can have them monitored with no human intervention. All you need is periodically poll this exporter ( on all masters), save on file if the output is not 'null' and configure Prometheus to reads this file. THIS IS A PROTOTYPE to start a discussion about this enhancement, it needs to modified in order to :

lloesche commented 6 years ago

Oh I'm a fan of this feature! I'm currently using a very half-assed SRV record to json generator I built a while ago (https://github.com/lloesche/prometheus-dcos/blob/master/srv2file_sd.go) but it depends on DC/OS or rather mesos-dns to discover all the Mesos agents. Getting rid of that dependency and using Mesos' /state.json alone for service discovery would be very nice.

That said, I wonder about the static port. I might want to poll several exporters (e.g. node_exporter, cAdvisor, mesos_exporter) on an agent. So maybe it'd be better to leave the port out (or optional) and just return a json of the agent nodes and then in the process where you curl the /sd api maybe add a simple jq to add ports for any exporter you're interested in. Also, the way it works on our clusters we usually run node_exporter on random, Mesos assigned ports and use relabeling to fake it back to 9100 so metrics are associated with the correct instance/time series.

Edit: one of our Mesos developers just told me that /state.json is not a good endpoint to query for this use case and we should check if we can get the required info from /state-summary. Depending on the size of the cluster and framework/task history size querying /state.json can freeze the master for a while.

maurorappa commented 6 years ago

all you mentioned can be changed, my idea was to show some potential new functionality we could introduce. I'll amend the static port, for the API endpoint I need to see the format..