scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
51 stars 33 forks source link

Make restore AZ aware #4039

Open Michal-Leszczynski opened 6 days ago

Michal-Leszczynski commented 6 days ago

During restore improvement meetings, it was mentioned that making SM AZ aware could speed up the restore process. We should experiment with that and see the results.

Michal-Leszczynski commented 6 days ago

Unfortunately, I don't have a clear idea on how to safely use AZ information in SM restore. @avikivity could you explain the idea behind it?

Michal-Leszczynski commented 6 days ago

cc: @karol-kokoszka @mykaul @tzach

avikivity commented 6 days ago

If datacenter.RF == count(datacenter.racks), then each rack gets one replica. Typical example is RF=3 and nr_racks=3.

If this holds, you can take a rack's backup and copy it to just one restored cluster rack, with nodetool refresh --load-and-stream --keep-rack (doesn't exist yet). This reduces the number of receivers from 3 to 1, and significantly reduces the compaction load.

Michal-Leszczynski commented 5 days ago

This reduces the number of receivers from 3 to 1, and significantly reduces the compaction load.

The reduction in receivers is already achieved with --primary-replica-only, but I guess that streaming withing the same rack should be faster.

Perhaps this would also speed up the post-restore repair, as (depending on data consistency during backup) less data would need to be transferred between the nodes during the repair.

bhalevy commented 5 days ago

Cc @regevran

regevran commented 18 hours ago

This should be a scylladb issue, but as an optimization, not for the general case.