scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
51 stars 33 forks source link

Batch sstables basing on percentage of the node's data #3979

Open Michal-Leszczynski opened 1 month ago

Michal-Leszczynski commented 1 month ago

Assume the following scenario: node with 8 shards, batch with 8 sstables, 1 is really big, 7 are small. As only a single shard works on load&stream of given sstable, we would end up in a situation where the 7 shards taking care of small sstables would finish their work quickly and wait for the single shard working on a big sstable. Creating batches of similarly sized sstables would improve load&stream performance.

mykaul commented 2 weeks ago

Is this part of 3.3.2?

Michal-Leszczynski commented 2 weeks ago

No, the 3.3.2 will mostly contain a fix for #3989 and perhaps #4007 and #3995. I would say that the restore related improvements are big enough for 3.4, but if there is a need to keep it as a patch release, it could be 3.3.3.

karol-kokoszka commented 1 week ago

The change is going to affect the meaning of batch-size flag when the value is set to 0. 0 is going to be wild-card meaning that batch size is expected to be 5% of the all data that would be sent to the node (basing on the following calculation -> *(amount_of_restore_data number_of_node_shards) / number_of_all_shards** ).