Open kennylevinsen opened 3 months ago
Relates to:
On the Thanos Receive side, the solution has been to disable all local compaction and in turn require Thanos Compact to have vertical compaction enabled. On the Thanos Sidecar side, there are suggestions for allowing upload of compacted chunks and to enable vertical compaction to solve the duplication.
None of this is documented though, and vertical compaction on the thanos compactor side is still experimental with open bugs related to rate/irate and a big warning in the manual.
One hacky solution for sidecar would be to enable compacted block upload, but add an option to delay block upload. By delaying past the out-of-order window and the block max time, we would be able to:
The delay would have to be larger than the max time + out of order time window, and smaller than the prometheus retention time as store would be further behind. If prometheus is crashed, there could still be an old overlap with something in the wal/wbl. Skipping upload of very old blocks, and the thanos compact skip overlap option might be enough for that.
Another (stupid) solution would be to have a way to make Prometheus always compact, and then only upload compacted blocks.
Of course, the best solution would just be to have thanos compact do the vertical compaction without rate issues...
vertical compaction on the thanos compactor side is still experimental with open bugs related to rate/irate and a big warning in the manual
Can you please remind me what's the open issue about rate/irate? I am aware of such issues of downsampling but nothing about vertical compaction as Thanos vertical compaction is the same as what Prometheus does.
We should mark vertical compaction as non experimental I think. Thanks for this, we should update docs.
I think we can just enable the same configuration in Prometheus so Prometheus disables overlap compaction locally and compactor will handle it https://github.com/prometheus/prometheus/issues/13112
Can you please remind me what's the open issue about rate/irate? I am aware of such issues of downsampling but nothing about vertical compaction as Thanos vertical compaction is the same as what Prometheus does.
From the Thanos docs "Vertical Compaction Risks" section: https://github.com/thanos-io/thanos/issues/2890. It's a quite old issue with the occasional ping, making the current situation unclear, but being left in the documentation does make it seem to end-users should still be cautious.
Making Vertical Compaction non-experimental and recommended at least for this use-case, in conjunction with a new Prometheus flag to mimic the new receiver behavior sounds good to me.
From the Thanos docs "Vertical Compaction Risks" section: https://github.com/thanos-io/thanos/issues/2890. It's a quite old issue with the occasional ping, making the current situation unclear, but being left in the documentation does make it seem to end-users should still be cautious.
Got it. If it is this rate bug then it is specific to the penalty deduplication mode and it shouldn't happen for the default 1:1 deduplication in vertical sharding.
For OOO blocks handled by compactor with vertical compaction, as long as penalty deduplication is not enabled (shouldn't as well since penalty dedup is for HA) I don't see any risks.
Got it. If it is this rate bug then it is specific to the penalty deduplication mode and it shouldn't happen for the default 1:1 deduplication in vertical sharding.
For OOO blocks handled by compactor with vertical compaction, as long as penalty deduplication is not enabled (shouldn't as well since penalty dedup is for HA) I don't see any risks.
Describing the example situation
HA pair of Prometheus sends data through thanos-sidecar. The compactor is operating in "penalty dedup" mode.
Later on, the system expands and remote_write + OOO
functionality is added to Prometheus. (According to the documentation, different modes are used for the two scenarios: one-to-one for receivers, penalty for HA pair). HA pair
+ remote_write
+ OOO
WITH penalty dedup
After deduplication, there is a reduction in the counter
metrics that were written via remote_write to HA pair prometheus. In one-to-one deduplication mode, everything falls apart.
1 What discussions will there be?
2 Why can't HA pair
with penalty dedup
mode and OOO
be done?
Hey @zoglam, sorry for the late reply. I think I got your point now... The merge between regular and OOO blocks should use one-to-one deduplication. And the deduplication between HA pairs should use penalty mode So in this case, it would require one-to-one deduplication first then penalty. I think this is not something we support atm.
I think deploying two set of compactors could help and they use different strategies. But I don't think we have a way to only compact blocks matching the target strategy. I will think about it more and see if there is a better way to do it.
Created issue https://github.com/prometheus-operator/prometheus-operator/issues/6829 on operator to track the idea describe in https://github.com/thanos-io/thanos/issues/7551#issuecomment-2258784742. The flag is already added to Prometheus
I have Prometheus v2.55.0 deployed using otlp-write-reciever, OOO, & the new hidden flag no-storage.tsdb.allow-overlapping-compaction
.
Prometheus Operator Config:
prometheus:
prometheusSpec:
image:
tag: v2.55.0
enableFeatures:
- 'otlp-write-receiver'
tsdb:
outOfOrderTimeWindow: "30m"
additionalArgs:
- name: "no-storage.tsdb.allow-overlapping-compaction"
Using Thanos Sidecar v0.36.1
with GCS.
It was my understanding that this would solve my compactor halting, however, it continues to halt due to OOO blocks.
Compactor Error
pre compaction overlap check: overlaps found while gathering blocks
I do not have compactor vertical compaction on, is this required when using no-storage.tsdb.allow-overlapping-compaction
? The Prometheus flag hasn't solved any issues.
Hey @initharrington, the new flag should turn off compaction at Prometheus for compacting OOO blocks with regular blocks. But the OOO blocks are still generated and uploaded to GCS.
compactor vertical compaction on
Yes I think you need to turn it on so that compactor compacts overlapped blocks. Otherwise you will see the error of the overlap check failure
Thanos, Prometheus and Golang version used:
Object Storage Provider:
S3 (minio)
What happened:
Prometheus is configured with a 1h
out_of_order_time_window
to allow some slack from a very large number grafana agents.Every once in a while, after writing new normal block, prometheus will write one or more out-of-order blocks. These blocks sometimes cover the current metric window (e.g., at 01:00 an out of order block is written that covers 00:00 to 02:00). Sidecar uploads this block immediately.
When prometheus finishes up the current window (e.g., at 03:00 a normal block is written that covers 00:00 to 02:00), Sidecar uploads the new block. At this point, the thanos compact discovers an overlap and gets upset.
What you expected to happen:
Either:
I don't think Prometheus is doing anything wrong here in how it creates the blocks, but please do correct me if I'm wrong.
How to reproduce it (as minimally and precisely as possible):
I have not tried to reproduce externally, but I believe this would be the way:
Based on the logs, it seems that sidecar doesn't always react to the ooo blocks.
Full logs to relevant components:
Other things
Relevant Prometheus config:
Prometheus command line:
Sidecar command line:
Compact command line: