Open Michal-Leszczynski opened 6 days ago
Unfortunately, I don't have a clear idea on how to safely use AZ information in SM restore. @avikivity could you explain the idea behind it?
cc: @karol-kokoszka @mykaul @tzach
If datacenter.RF == count(datacenter.racks), then each rack gets one replica. Typical example is RF=3 and nr_racks=3.
If this holds, you can take a rack's backup and copy it to just one restored cluster rack, with nodetool refresh --load-and-stream --keep-rack (doesn't exist yet). This reduces the number of receivers from 3 to 1, and significantly reduces the compaction load.
This reduces the number of receivers from 3 to 1, and significantly reduces the compaction load.
The reduction in receivers is already achieved with --primary-replica-only, but I guess that streaming withing the same rack should be faster.
Perhaps this would also speed up the post-restore repair, as (depending on data consistency during backup) less data would need to be transferred between the nodes during the repair.
Cc @regevran
This should be a scylladb issue, but as an optimization, not for the general case.
During restore improvement meetings, it was mentioned that making SM AZ aware could speed up the restore process. We should experiment with that and see the results.