prometheus / OpenMetrics

Evolving the Prometheus exposition format into a standard.
https://openmetrics.io
Apache License 2.0
2.37k stars 171 forks source link

OpenMetrics Discovery Format #258

Open kwiesmueller opened 2 years ago

kwiesmueller commented 2 years ago

I wanted to start a discussion on an idea that could fit in well with OpenMetrics.

I've spent a lot of time recently looking at metrics exported by various applications (mainly K8s components) and trying to find the metrics that are available. Most of the time this involves looking through the code of the respective application and hoping all metrics exported about a topic are defined in a single source file and easy to find. The other option is to run the respective application and hope all metrics are always exported (none of them are only exported in certain conditions).

This seems like a usability issue. Other APIs have solved this with e.g. OpenAPI, gRPC/Protos and I'd like to view metrics more like another API surface too. So what do people think about a formal OpenMetrics Discovery Format? A way for applications to specify the metrics they export in a structured, human and machine readable format.

It would help operators decide which metrics they care about and identify potential gaps in the monitoring surface of an application. For Prometheus specifically it would help operators get a better understanding of the load an application might generate and what metrics to filter for.

It could also help developers as they have a structured single-point of truth allowing them to detect breaking changes in their metrics API, depending on how this is realized (this would probably require some sort of code generation).

It could be served either at runtime (similar to openapi) which is probably easier to implement, or shipped as a file with the code (which would be easier for end users).

Is anyone aware of a discussion like this in the past, has something it it been proposed before?

brian-brazil commented 2 years ago

The way I would suggest to handle this with OpenMetrics is to fetch the metrics. It is intended that all the useful information is in-band and all available metrics should be exported, even if the metric family is empty right now. Expecting developers to expose metrics in two different ways is a bit hopeful I suspect, there's already challenges with getting metrics exported correctly one way - and if they're exported correctly that way then the information is already in the returned metrics as a given metric family should always be exposed.

Load is down to cardinality more than anything, a list of metadata is unlikely to tell you which metrics may be problematic in that regard.