Open jszwedko opened 3 years ago
I think you're right on the description of what's happening, and I too am not sure what we can do about it. This was, of course, not a problem in the past because the prometheus_exporter
sink was largely ignorant of timestamps, so the metrics were always fresh. Possibly log_to_metric
could be modified to optionally create the metrics with a null timestamp. Then prometheus_exporter
will also export them without a timestamp. Does that make sense?
@bruceg
True 🤔 I think there is some nuance here depending on the "source" and sink of the metric data though.
For example, in this pipeline:
I think it'd be reasonable to have prometheus exporter expose the metrics with no timestamp since it is aggregating the incoming metrics (like relative counters).
But in:
I think we would want the timestamp as it could differ if the data was buffered to disk and then read later.
I think maybe the difference is whether the sink is aggregating the metrics or streaming them.
However, I also think in:
That we would want the timestamps to propagate from the scraped source (if they are set).
So it is not as simple as just dropping the timestamps in prometheus_exporter
or not setting them in log_to_metric
.
Been facing this problem, and stumbled upon this issue. Specifically, we're getting errors doing remote-write to prometheus for out-of-order data. It indeed seems to be triggered by log-to-metric
metrics. I think some flexibility added to log_to_metric
on how to control the timestamp
field would be a good option. The current behavior is to use the timestamp of the logs, which for some cases is perfectly reasonable; "here is a metric for time X". But in our case, we're just trying to measure in terms of what's being processed at the time, so I'd rather just use the current time. Of course, there could be an option to toggle between these two behaviors, or full VRL support could be added, so the user can explicitly define how to derive the metric timestamp. I think just being able to choose between log timestamp and "now" (processing timestamp) would be good enough for most cases, but maybe there are some edge cases where a user might want to derive it another way (e.g. from another field entirely).
A simple workaround is to throw in another remap
transform to explicitly change the timestamp of the log or metric, but this feels heavy-handed (and I'm wondering if it would still be technically possible to generate out-of-order data while remote-writing to prometheus... there probably would be via batching and concurrency).
Another option (which we're also pursuing) is to use something which supports out-of-order writes. But it would be nice if we could reduce the errors in prometheus remote-writes without having to throw in a new remap
node.
A user in discord reported an issue setting up a pipeline like:
Where the metrics would fail to be scraped by prometheus if they were not recently updated.
This appears to be because
log_to_metric
sets the metric timestamps based on the log timestamps. If no log comes through that updates, for example, a given counter for a while, then theprometheus_exporter
will expose the metric with an old timestamp and prometheus will not scrape it. This results in gaps in in the timeseries.I'm not 100% sure what the fix should be here. I think the result should be either the metrics are exposed, in the
prometheus_exporter
with an updated timestamp, even if they weren't modified recently, or they should not have a timestamp set at all, after which the prometheus scraper will default it.@bruceg do you have any thoughts on this one?