Open Qookyozy opened 9 months ago
I wonder if its related to what was discussed here before: https://cloud-native.slack.com/archives/CL25937SP/p1697127041904839.
Can you describe your cluster a bit more please? That would help in figuring out whats going on!
@MichaHoffmann Here is our cluster architecture and the reasons for setting it up this way. Thanks for your help! Alarm datasource: Querying two clusters to prevent alarms from becoming unavailable due to a single cluster failure. Dashboard datasource: Querying only B cluster to prevent large queries from causing two cluster receiver OOM and affecting alerts.
Compactor compacts together blocks Uploaded from both clusters again right? It's configured with replica label "receive-cluster" right?
@MichaHoffmann Compactor compacts together blocks Uploaded from both clusters again right? NO It's configured with replica label "receive-cluster" right? YES
A more complete architectural diagram is shown below.
Component startup parameters `Compactor B
Receive B
Receive A
Hello @MichaHoffmann, may I inquire if there have been any recent developments on this issue?
Thanos, Prometheus and Golang version used: thanos receive:v0.32.5 thanos query:v0.33.0 thanos query-frontend:v0.33.0
What happened: increase/rate got abnormally large result on aggregated data sources
What you expected to happen: increase/rate got correct result on aggregated data sources
How to reproduce it (as minimally and precisely as possible): When aggregating two clusters with identical data using thanos query, and using the increase/rate function, abnormally large results occur when metrics undergo chunk switching.
Full logs to relevant components:
Anything else we need to know: have conducted the following validations: