grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.05k stars 515 forks source link

kafka replay speed: fix concurrent fetching concurrency transition #9447

Closed dimitarvdimitrov closed 1 day ago

dimitarvdimitrov commented 1 day ago

What this PR does

fixes

  1. There's a race between Add()ing to the WaitGroup and then Wait()ing for it; it's possible that we close(r.done) and Wait(); then Wait() just returns immediately
  2. If we immediately call Update() after newConcurrentFetchers, then lastReturnedRecord might stay as -1. So we'd resume from offset -1 🥴 instead of from the start or end of the partition
  3. after Update() we should continue from lastReturnedRecord+1 because we've already fetched and returned lastReturnedRecord; this isn't critical though

This is missing tests; WIP

Which issue(s) this PR fixes or relates to

Fixes #

Checklist