scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra
Apache License 2.0
54 stars 34 forks source link

Support a way to configure partitioning by segment size #146

Open julienrf opened 1 month ago

julienrf commented 1 month ago

Currently, you can manually set scanSegments to control the number of partitions to split the data into.

We could also provide a segmentSize setting which could be an alternative to scanSegments to control the partitioning. The benefits would be that the same configuration (e.g. segmentSize: 100) could be used with different tables and it would automatically compute an appropriate number of segments that fit the desired size. Whereas with scanSegments you have to adapt its value to every table you migrate (because you want to adjust the number of segments according to each table size).

_Originally posted by @julienrf in https://github.com/scylladb/scylla-migrator/pull/143#discussion_r1619045818_