Open LeeSmet opened 3 years ago
IMO this is fairly essential. Under the current scheme, the backend data usage is multiplied by the number of rebuild operations that have been carried out, plus one for the initial write. So in the case of an initial backend configuration with some data stored, replacing a single backend and rebuilding means doubling the data usage in all of the backends that didn't get replaced, since a duplicate of all data is written to them again.
The current rebuild logic is fairly simple: retrieve data, reencode, and send back to the new backends. However, we can check if any of the new backends is also used in the old metadata. If it is, we can assign it the same shard, eliminating the write to that backend, saving some space.
Need toch check if encoding is deterministic for this, especially if it is a parity shard