influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
29.08k stars 3.56k forks source link

Query where >= ... doesn't work after 1.8.6, still works after reverting to 1.8.5 #21790

Open SGStino opened 3 years ago

SGStino commented 3 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

We run following queries: SELECT "value" FROM "somemeasure" WHERE time <= '2021-06-20T00:00:00.0000000+00:00' SELECT "value" FROM "somemeasure" WHERE time >= '2021-06-20T00:00:00.0000000+00:00'

In influx 1.8.5 this is the result: image In influx 1.8.6 this is the result: image

Expected behavior: I'd expect time filters with an "after" to still work after updating

Actual behavior: Queries with "time after" return no results, not in grafana, and not with influx:8086/query?q=...

Environment info:

influx docker image 1.8.5 and 1.8.6

Logs: no significant change between 1.8.6 and 1.8.5:

[httpd] 10.255.0.5, 10.0.8.251, 10.0.8.251,10.0.6.98 - admin [06/Jul/2021:09:09:39 +0000] "GET /query?db=MyDBS&epoch=ms&q=SELECT+%22value%22+FROM+%22My.Measure.1%22+WHERE+time+%3C%3D+%272021-06-20T00%3A00%3A00.0000000%2B00%3A00%27%3BSELECT+%22value%22+FROM+%22My.Measure.1%22+WHERE+time+%3E%3D+%272021-06-20T00%3A00%3A00.0000000%2B00%3A00%27 HTTP/1.1" 200 1693699 "-" "Grafana/8.0.4" e4bf9f05-de39-11eb-8008-02420a000665 965895
ts=2021-07-06T09:09:42.818791Z lvl=info msg="Executing query" log_id=0VAlyvTW000 service=query query="SELECT value FROM MyDBS.autogen.\"My.Measure.1\" WHERE time <= '2021-06-20T00:00:00.0000000+00:00'"
ts=2021-07-06T09:09:42.994027Z lvl=info msg="Executing query" log_id=0VAlyvTW000 service=query query="SELECT value FROM MyDBS.autogen.\"My.Measure.1\" WHERE time >= '2021-06-20T00:00:00.0000000+00:00'"
SGStino commented 3 years ago

There seems to be more going on:

2021-06-21 to 2021-06-22: image While 2021-06-22 to 2021-06-23 returns: "No data in response"

But 2021-06-21 to 2021-06-23 returns this: image

Trying the game of higher/lower a little further; WHERE time >= 1624233600000ms and time <= 1624356000000ms: no data WHERE time >= 1624233300000ms and time <= 1624356000000ms: 51384 points.

This seems to happen with multiple measures around the same timeranges. And even on 1.8.5

SGStino commented 3 years ago

A little more insights were gained:

the boundary where data stops loading is 02:00+02:00, this means 00:00Z, this is probably the boundary of a shard file.

which means that if we span a query over shard X-1 and X, it loads data, if we load data just from shard X, it doesn't.

So what this told us, is that we had a problem with our data files. We've re-imported them from non-influx blob storage and now it loads correctly.

We suspect this may have been caused by instability with the influx server during the previous import, which loads measure per measure into the database for an entire month at a time.

Somehow this resulted in an issue with certain shards.