prometheus / client_python

Prometheus instrumentation library for Python applications
Apache License 2.0
4k stars 800 forks source link

Error on ingesting samples that are too old or are too far into the future #725

Open Gim6626 opened 3 years ago

Gim6626 commented 3 years ago

Hi!

Faced strange situation with timestamps.

Briefly:

  1. If timestamp is taken from datetime.datetime.now() - everything is ok.
  2. If timestamp is parsed from the date which looks same - I got warning in Prometheus server log about bad datetime and metric is not shown.

Here is code example with comments about which metrics works and which don't:

import datetime
import sys
import time
import pytz

from prometheus_client import (
    start_http_server,
)
from prometheus_client.core import (
    GaugeMetricFamily,
    REGISTRY,
)

class BugDemoMetricsCollector:

    def collect(self):
        dt_format = '%Y-%m-%d_%H-%M-%S.%f %z'
        dt_now = datetime.datetime.now(tz=pytz.timezone('UTC'))
        print(dt_now)
        # Works
        gobj = GaugeMetricFamily('FooMetricGood', '')
        gobj.add_metric([], 123, timestamp=dt_now.timestamp())
        yield gobj
        # Works too
        dt_now_str = dt_now.strftime(dt_format)
        dt_parsed = datetime.datetime.strptime(dt_now_str, dt_format)
        gobj = GaugeMetricFamily('FooMetricGoodToo', '')
        gobj.add_metric([], 456, timestamp=dt_parsed.timestamp())
        yield gobj
        # Does not work, but same date
        dt_custom_str = '2021-11-11_18-12-59.000000 +0000'
        dt_parsed_from_custom = datetime.datetime.strptime(dt_custom_str, dt_format)
        gobj = GaugeMetricFamily('FooMetricNotWorking', '')
        gobj.add_metric([], 789987, timestamp=dt_parsed_from_custom.timestamp())
        yield gobj

def main():
    start_http_server(8080)
    REGISTRY.register(BugDemoMetricsCollector())
    while True:
        time.sleep(1)

if __name__ == '__main__':
    sys.exit(main())

Message from Prometheus log about trouble metric:

prometheus-prometheus-1  | ts=2021-11-11T13:41:01.895Z caller=scrape.go:1563 level=warn component="scrape manager" scrape_pool=services target=http://192.168.64.1:8080/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=1

But date is correct, here is Python code for used date and format:

>>> datetime.datetime.strptime('2021-11-11_18-12-59.000000 +0000', '%Y-%m-%d_%H-%M-%S.%f %z')
datetime.datetime(2021, 11, 11, 18, 12, 59, tzinfo=datetime.timezone.utc)

I've spent more than one day trying to google, read docs and resolve it - no results.

Any help will be very appreciated.

anon8675309 commented 1 year ago

I ran into this same error message when using Prometheus. This is an issue with promethus itself, not an error in client_python.

It turns out, it's not comparing the time of the incoming log to the current system time. It is comparing it to the latest time in the database. The log message is missing three key pieces of data: what the time of the entry is, whether the entry is too old or too far in the future, and boundry date/time that determines whether an entry is acceptable.

As best I can tell, the responsible code is here. I say this because the similar code in target.go did not change the error message when I commented it out and deployed my modified build, whereas when I commented out the checks in head_append.go, they got rejected because they were out of order instead of too old or too far in the future.

Thus, if you have an entry in the database in the future, all new data will be rejected because it's too old. It doesn't look like promtool tsdb ... has any way to delete future entries. I solved it on my system by stopping prometheus, removing the data directory and starting prometheus again.

This can be reproduced by setting the system clock to the future, adding some data to prometheus, and then correcting the system clock.

Apologies if replying to an old post is a faux pas here, but there's only one other place this issue seems to have been discussed and that thread was closed without resolving the issue. So my hope is that this comment will help people in the future when they run into this same error message.