vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.71k stars 1.57k forks source link

Publish all metrics with an initial value #6532

Open jszwedko opened 3 years ago

jszwedko commented 3 years ago

Current Vector Version

vector 0.12.0 (g27e8f7f x86_64-unknown-linux-gnu 2021-02-22)

Use-cases

It has occurred to me that we don't publish metrics for events until the event is fired, for example we never publish processing_errors_total for the json_parser transform until an event actually fails to parse. I believe this can lead to some confusion with users not understanding what set of metrics should be published for a given set of components and requires users to convert null values to 0 when making dashboards (in Grafana at least). The latter aspect also makes it impossible to tell when a metric is missing or simply hasn't been published yet.

I think this may be one cause of https://github.com/timberio/vector/issues/6530

Example config:

[sources.in]
  type = "stdin"

[sources.metrics]
  type = "internal_metrics"

[transforms.json]
  type = "json_parser"
  inputs = ["in"]

[sinks.blackhole]
  type = "blackhole"
  inputs = ["json"]

[sinks.console]
  type = "console"
  inputs = ["metrics"]
  encoding.codec = "json"

Note that if you only publish valid JSON messages, you will never see

{"name":"processing_errors_total","namespace":"vector","tags":{"component_kind":"transform","component_name":"json","component_type":"json_parser","error_type":"failed_parse"},"timestamp":"2021-[0/1909]:36:33.514508Z","kind":"absolute","counter":{"value": 0}}

In the output. That metric only appears if an event fails to parse as JSON.

Proposal

Ensure that all metrics are published initially with their 0 value.

References

jszwedko commented 2 years ago

There are a couple of complications with this:

nmiculinic commented 1 year ago

Good step into right direction would be including 0's for non-component based https://vector.dev/docs/reference/configuration/sources/internal_metrics/

e.g.

etc; since those are quite important to get right with rate promQL in the monitoring system

haiwu commented 3 months ago

Just hit the same issue. When we could expect this issue to be fixed?

jszwedko commented 3 months ago

Just hit the same issue. When we could expect this issue to be fixed?

It's not currently on the roadmap so it is difficult to say (contributions, of course, always welcome). As mentioned above this is also tricky for metrics that have dynamic tags.