Should Be Able to Specify that Prometheus Metrics Never Expire

Is your feature request related to a problem? Please describe. Many monitoring and alerting systems have difficulty handling null (not zero) values or metrics that disappear and reappear. To avoid these problems, its useful to publish all the metrics one can export and if there's no data to be associated assign a zero value.

Vault's handling of metrics is principally event based, which means that metrics only appear for scraping when they exist and then will disappear when they expire. For example, the metric vault_core_leadership_lost_sum will appear once leadership is lost in the cluster and will then disappear when the retention period has been reached.

Additionally, prometheus scraping cannot be turned unless a non-zero expiration time is passed in via the telemetry config. Interestingly, by passing in an retention time, enables prometheus metrics, which is non-obvious.

Describe the solution you'd like

whatever metrics vault can generate should always be exported when a scrape occurs
users should be enable the prometheus endpoint without expiring any metrics
nice-to-have: decouple enabling prometheus from specifying the retention time.

Describe alternatives you've considered Are alternative is creating a proxy that maintains the values for us persistently.

I can add some context for why this is the way it is that might help a future investigator.

This is a problem with all HashiCorp tools (at least as far as I know) as they are instrumented using hashicorp/go-metrics which started as a statsd wrapper and eventually got support for Prometheus too. As such, it's semantics don't really work with Prometheus perfectly.

The expiration time seems to have been an attempt when initially writing the Prometheus sink to match the statsd/graphite expectation that you can throw whatever dynamic metrics in at runtime without declaring them upfront and they will be flushed within a few seconds so no memory consumption to worry about. Prometheus on the other hand expects all metrics to be declared up front and to remain static during the life of the program (more or less).

In Consul we partly fixed this by adding declarations for at least a subset of the core metrics and updating the Prometheus sink to never expire those metrics.

But we didn't go as far as disallowing undefined metrics mostly because many layers of HashiCorp's libraries (e.g. Raft and Serf and Memberlist) all use go-metrics and all don't pre-defined their metrics so there is a risk that we'd miss a case and loose metrics or need to refactor all of those call sites.

Summary: we require a TTL and expire metrics specifically because go-metrics API was born in the pre-prometheus era where metrics were treated as ephemeral rather than static pre-defined data types. Just assuming all metrics are static might be safe, or might leak memory depending on how individual metrics are used. Even if safe, it still means metrics won't be scraped until they have be observed at least once unless we make the changes to actually pre-define all metrics up front in Vault.

I'd really love to change this. The ideal fix for this would be to switch all of HashiCorp's products and libraries to instrumenting their code with a more modern library (either the Prometheus client directly or something else like OTEL?). The biggest challenges to that are:

It's a lot of work
It would be backwards incompatible with existing metrics sinks and configs which could cause issues for lots of users. Most other projects choose one exposition format (e.g. Prometheus) and relied on third party solutions like Teleport for folks that wanted to push to other backends. HashiCorp tools however support a wide variety of metrics "sinks" out of the box which is convenient but also makes it much harder to change how we instrument.
Ideally we'd change all of HashiCorp's tools to be consistent which makes it even more work!

More realistically, Vault could potentially adopt a similar approach to Consul of starting to define metrics up front so at least those don't disappear.

We could then optionally not need a retention time set when configuring Prometheus metrics and pick a default short one just in case other metrics creep through. We would probably still want it to be explicitly enabled since it uses memory and exposes additional APIs that users of other metrics sinks will not want.

hashicorp / vault

Should Be Able to Specify that Prometheus Metrics Never Expire #7137