Open mrliptontea opened 3 years ago
promtool tsdb create-blocks-from rules \ --url http://thanos-query \ --start "$(date +%s -d '2021-04-03 00:00:00')" \ --end "$(date +%s -d '2021-05-31 23:59:59')" \ /etc/prometheus/rules/backfiller.rules.yaml
We need to think about whether it makes sense to support rules backfilling in Thanos tool. Features: 1. Thanos specific query parameters 2. Different rules format as Thanos' rule supports partial_response_strategy 3. ...
I also encountered a similar problem. When querying indicators in a long period, the storage gateway does not return data. This is my thanos-compact configuration list
/data/app/thanos/bin/thanos compact -w \
--wait-interval=1m \
--compact.cleanup-interval=1m \
--compact.concurrency=4 \
--compact.skip-block-with-out-of-order-chunks \
--downsample.concurrency=4 \
--block-sync-concurrency=60 \
--tracing.config-file=/data/app/thanos/conf.d/trace.jaeger.yml \
--block-meta-fetch-concurrency=64 \
--objstore.config-file=/data/app/thanos/conf.d/object.json \
--data-dir=/data/compact/ \
--delete-delay=4h \
--retention.resolution-raw=5d \
--retention.resolution-5m=15d \
--retention.resolution-1h=0d
This is my thanos-store configuration list
/data/app/thanos/bin/thanos store --data-dir=/data/thanos \
--log.format=json \
--store.enable-index-header-lazy-reader \
--store.grpc.series-max-concurrency=200 \
--consistency-delay=2h \
--block-meta-fetch-concurrency=150 \
--block-sync-concurrency=60 \
--sync-block-duration=1m \
--ignore-deletion-marks-delay=2h \
--objstore.config-file=/data/app/thanos/conf.d/object.json \
--min-time=-96h \
--index-cache-size=7024MB \
--index-cache.config-file=/data/app/thanos/conf.d/index.json \
--tracing.config-file=/data/app/thanos/conf.d/trace.jaeger.yml \
--store.caching-bucket.config-file=/data/app/thanos/conf.d/cache.json
/data/app/thanos/bin/thanos --version
thanos, version 0.23.0-rc.0 (branch: HEAD, revision: 81841aed4a9e3d6f6ed772fea287f04504d164f3)
build user: root@d1bae1e2c93c
build date: 20210908-14:46:03
go version: go1.16.7
platform: linux/amd64
This could be nice to have. Currently, Thanos Query is unaware of the available resolutions. Data about available resolutions could be included in the metadata response.
Hello š Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! š¤
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Hello š Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! š¤
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
I ran into the same issue today. To make matters worse, for long-range queries (>10d) auto-downsampling will actually scan and load up to 3 times more blocks since we retrieve all resolutions up to the lowest one.
I wonder if we can use the meta.json files to know which resolutions of which blocks exist anywhere in the storage layer. This way we can accurately tell whether we need to load a block for a higher resolution, or load the one for the requested resolution.
I think that data is available when looking at bucketweb, so it should be possible to utilize it in auto downsampling
This could be nice to have. Currently, Thanos Query is unaware of the available resolutions. Data about available resolutions could be included in the metadata response.
Any chance this feature could be aware in an upcoming release? Would be great to have that additional context from a troubleshooting perspective as you could easily identify if you're using the raw, 5m or 1h downsampled series.
Is your proposal related to a problem?
Similar problem was already discussed in #1170, #3704 and possibly other issues.
My specific use case is backfilling. Specifically, https://github.com/kubernetes-monitoring/kubernetes-mixin/commit/e996e00fa3a0c17a7a9f5d01f6c1a3544731bd33 recently changed how CPU usage is tracked. This means as soon as you apply this update, Grafana graphs for CPU will lose historical data.
The issue is most definitely related to "auto downsampling" feature of Thanos. In Compact we set:
So beyond 30 days we only keep 1h resolution. In Query we have
--query.auto-downsampling
enabled.Obviously, the queries in kubernetes-mixin use
irate
with[5m]
which isn't going to work at 1h resolution. So for this process, I can modify it to use e.g.[5h]
instead.:This should give me reasonably good historical data. I compared it with the old recorded rule
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
over a long range and graphs are nearly identical. So far so good.The problem appears when querying over a smaller range, e.g.
4w
- the query doesn't return any data. The cause is that we only have 1h resolution but auto downsampling tries to use 5m:I get the correct results if I manually select the max resolution:
And it works over a longer range:
Unfortunately, with promtool, I have absolutely no way of controlling the resolution:
Notice that the range I'm requesting is just over 8 weeks (I couldn't go with longer range without promtool being OOM killed), which should be long enough to force 1h resolution. Yet I get not blocks from running this command, which I think is because it's not how promtool evaluates the rule.
Backfilling is just one scenario where this is a problem.
Describe the solution you'd like
Auto downsampling should take available resolution into account rather than just a naive method based on step / 5.
Describe alternatives you've considered
Since Thanos already has Ruler, would be possible to port
create-blocks-from rules
tool to Thanos and add ability to specify the resolution there?Additional context
Am I missing something?