vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.65k stars 1.56k forks source link

Switch to emitting incremental counter metrics through a new storage implementation #13863

Open jszwedko opened 2 years ago

jszwedko commented 2 years ago

Currently, we emit absolute metrics for anything we scrape from the internal metrics registry. This is fine for gauges, and even for histograms, we end up merging that stuff together. This is bad for counters, though.

It's bad because for many metrics sinks, they only emit incrementalized metrics downstream, so in order to incrementalize an absolute metric, you must observe it at least twice in order to begin tracking the delta between observations. This means that if we expired a counter, and then it came back to life -- maybe it's updated very infrequently -- we would potentially lose some of the increments to that counter between the first time we see it when it comes back to life and the second time we observe it.

By writing our own customized storage, we could provide a hook to consume a counter's value -- essentially, swap the value to 0 and take the previous value -- such that we could emit the consumed value as an incremental update. This would ensure that even if a metric is expired but comes back, we're always consuming its entire value so we never miss an update.

This would work, specifically, because we would only do expiration when iterating over the metrics and observing them, not automatically in the background, so we would always have a chance to observe the metric (and potentially consume its value if it's a counter) before removing it from the registry.

Ref #11995

bruceg commented 2 years ago

Reverted by #14251. This is going to be more work than the simple effort here. There are two complications that need to be considered to move this forward:

  1. These metrics are used by vector top as well as the API, which require absolute values. If we convert these counters to incremental, we will need to convert them into accumulators to provide those absolute values, either as an additional interface layer or internal to top and the API.
  2. As it stands now, if multiple consumers capture metrics, each capture will reset the counters and each will only receive partial values. As such, we will need to introduce a new layer between the metrics and the consumers to fan out the values similar to how we fan out internal logs.