prometheus / OpenMetrics

Evolving the Prometheus exposition format into a standard.
https://openmetrics.io
Apache License 2.0
2.37k stars 171 forks source link

Extend OpenMetrics by a stability data type #199

Open mrueg opened 3 years ago

mrueg commented 3 years ago

In order to improve metrics experience, I would like to propose two fields to be added to the spec.

Stability

Stability is a string that is  used to describe the maturity of the MetricFamily. If it is unset, the MetricFamily is considered stable
Recommended values for lifecycle:
* Alpha for metrics that might be renamed or are unstable in terms of other features (e.g. units or dimensions).
* Beta for metrics that will be considered stable soon.
* Stable This is the default and what consumers can assume if the Stability string is not set.
* Deprecated for metrics that should not be used anymore. 
StabilityHint

StabilityHint is string that is used to provide a human-readable hint for the MetricFamily. This can include e.g. a replacement for the MetricFamily.

As an example:

# TYPE process_cpu_microseconds_total counter
# UNIT process_cpu_microseconds_total seconds
# HELP process_cpu_microseconds_total Total user and system CPU time spent in seconds.
# STABILITY process_cpu_microseconds_total Deprecated
# STABILITYHINT process_cpu_microseconds_total This metric going to be replaced by process_cpu_seconds_total.
process_cpu_microseconds_total 4.20072246e+06

This could allow consumers to warn when an unstable metric is used and improve metrics lifecycle. We currently see issues where e.g. grafana dashboards use a deprecated metric and users only figure out when the metric is gone on the consumer side.

This is similar to https://github.com/OpenObservability/OpenMetrics/issues/189 but with a clearer focus on lifecycle.

SuperQ commented 3 years ago

Nice idea. What if we recommended to put the hint in the HELP string, rather than add another field?

mrueg commented 3 years ago

Nice idea. What if we recommended to put the hint in the HELP string, rather than add another field?

This is definitely an option as well, I didn't want to assume a specific workflow and/or break existing workflows when it comes to consuming HELP fields, that's why I suggested a separate one.

brian-brazil commented 3 years ago

In general, I question the utility of this as in practice it's quite difficult to predict how software subsystems might evolve over time - and thus that their metrics might also become obsolete. Different types of software (many of which will co-exist inside one application) have very different meanings, lifecycles, and velocities associated with "stability". Accordingly documentation within each respective project would seem a better approach overall, rather than trying to formalise some of the fundamental complexities of software engineering into a metrics standard.

SuperQ commented 3 years ago

This has the same utility as SNMP's OID STATUS deprecated. I do wonder if the ENUM should be simpler like simply current and deprecated.

debuglevel commented 1 year ago

Nice idea. What if we recommended to put the hint in the HELP string, rather than add another field? An extra field might be better for user assistance - e.g. strike through a deprecated metric in e.g. Grafana.

If it is just mentioned in HELP, Grafana would need to assume that HELP.contains("deprecated") does really really mean it is deprecated (Which would be a problem in something like # HELP app_deprecated_calls How many calls to deprecated API endpoints were made).

# HELP app_requests How many requests were made. @DEPRECATED@ in favor of app_unicorns_total or another very unambiguous solution should be okay.