scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
51 stars 33 forks source link

Stop tablet load balancing during repair #3773

Closed Michal-Leszczynski closed 5 months ago

Michal-Leszczynski commented 6 months ago

Repairing should happen only when tablet load balancing is disabled. Without that, it would be possible for some tablets (mapped to ranges) to escape from being repaired by migrating from not repaired range to an already repaired one.

But what should happen when repair is paused? If SM re-enables tablet load balancing, then it would need to repair interrupted table from scratch (note that SM repairs table by table). If SM leaves tablet load balancing as disabled, then it might cause problems when paused repair task is never resumed.

I think that re-enabling it and starting repair of interrupted table from scratch is safer, but it would be nice to have other opinions on that.

Ref https://github.com/scylladb/scylladb/issues/17435

cc: @karol-kokoszka @tzach @bhalevy