grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.12k stars 527 forks source link

Grace period for partial block detection #1677

Open aknuds1 opened 2 years ago

aknuds1 commented 2 years ago

Describe the bug

The cleaner component of the compactor should make use of a grace period before it reports a block as partial. The reason being that it's currently subject to a race condition, where it may detect a block as being partial while it's actually being uploaded.

To Reproduce

Expected behavior

For the cleaner not to report a block being uploaded as partial.

Environment

Additional Context

pracucci commented 2 years ago

The annoying thing is that it's not trivial to detect the "upload time" of a partial block. We're used to look at its meta.json file but a partial block doesn't have it by definition (a block is partial if doesn't have meta.json).

I'm wondering if we could leverage on the bucket index. We could add partial blocks to the bucket index (right now they're skipped) as a separate entry in the index, and store the timestamp of when we found them. Then based on the difference between <now> - <time when partial block has been found> we can apply a grace period before warning about them.

Better ideas?