Open nicolastakashi opened 2 years ago
The problem is a bit bigger - there are object storage operation "failures" coming from this, leading to false alerts. Need to think about how to solve this.
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Seems groupcache also causes increased number of get operations to the object storage, and almost all of them are failures (I believe all of them are for the deletion mark).
When I enable groupcache I almost instantly faced alert: (sum by (job) (rate(thanos_objstore_bucket_operation_failures_total{job=~".*thanos-store.*"}[5m])) / sum by (job) (rate(thanos_objstore_bucket_operations_total{job=~".*thanos-store.*"}[5m])) * 100 > 5)
where value quickly become almost 100% while without groupcache there was no errors, version of Thanos is 0.31.0
Thanos, Prometheus and Golang version used:
Object Storage Provider: Azure
What happened: When I'm using the Thanos Store Group Cache feature, I'm facing a bunch of errors on logs.
What you expected to happen: Don't see these errors on the logs.
How to reproduce it (as minimally and precisely as possible): Just enablg Group Cache Feature
Full logs to relevant components:
Anything else we need to know: N/A