vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.66k stars 1.56k forks source link

Stale timestamps when using `log_to_metric` with `prometheus_exporter` #7350

Open jszwedko opened 3 years ago

jszwedko commented 3 years ago

A user in discord reported an issue setting up a pipeline like:

Where the metrics would fail to be scraped by prometheus if they were not recently updated.

This appears to be because log_to_metric sets the metric timestamps based on the log timestamps. If no log comes through that updates, for example, a given counter for a while, then the prometheus_exporter will expose the metric with an old timestamp and prometheus will not scrape it. This results in gaps in in the timeseries.

I'm not 100% sure what the fix should be here. I think the result should be either the metrics are exposed, in the prometheus_exporter with an updated timestamp, even if they weren't modified recently, or they should not have a timestamp set at all, after which the prometheus scraper will default it.

@bruceg do you have any thoughts on this one?

bruceg commented 3 years ago

I think you're right on the description of what's happening, and I too am not sure what we can do about it. This was, of course, not a problem in the past because the prometheus_exporter sink was largely ignorant of timestamps, so the metrics were always fresh. Possibly log_to_metric could be modified to optionally create the metrics with a null timestamp. Then prometheus_exporter will also export them without a timestamp. Does that make sense?

jszwedko commented 3 years ago

@bruceg

True 🤔 I think there is some nuance here depending on the "source" and sink of the metric data though.

For example, in this pipeline:

  1. source: log_to_metric
  2. sink: prometheus_exporter

I think it'd be reasonable to have prometheus exporter expose the metrics with no timestamp since it is aggregating the incoming metrics (like relative counters).

But in:

  1. source: log_to_metric
  2. sink: prometheus_remote_write

I think we would want the timestamp as it could differ if the data was buffered to disk and then read later.

I think maybe the difference is whether the sink is aggregating the metrics or streaming them.

However, I also think in:

  1. source: prometheus_scrape
  2. sink: prometheus_exporter

That we would want the timestamps to propagate from the scraped source (if they are set).

So it is not as simple as just dropping the timestamps in prometheus_exporter or not setting them in log_to_metric.

srstrickland commented 10 months ago

Been facing this problem, and stumbled upon this issue. Specifically, we're getting errors doing remote-write to prometheus for out-of-order data. It indeed seems to be triggered by log-to-metric metrics. I think some flexibility added to log_to_metric on how to control the timestamp field would be a good option. The current behavior is to use the timestamp of the logs, which for some cases is perfectly reasonable; "here is a metric for time X". But in our case, we're just trying to measure in terms of what's being processed at the time, so I'd rather just use the current time. Of course, there could be an option to toggle between these two behaviors, or full VRL support could be added, so the user can explicitly define how to derive the metric timestamp. I think just being able to choose between log timestamp and "now" (processing timestamp) would be good enough for most cases, but maybe there are some edge cases where a user might want to derive it another way (e.g. from another field entirely).

A simple workaround is to throw in another remap transform to explicitly change the timestamp of the log or metric, but this feels heavy-handed (and I'm wondering if it would still be technically possible to generate out-of-order data while remote-writing to prometheus... there probably would be via batching and concurrency).

Another option (which we're also pursuing) is to use something which supports out-of-order writes. But it would be nice if we could reduce the errors in prometheus remote-writes without having to throw in a new remap node.