scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra
Apache License 2.0
54 stars 34 forks source link

Set the number of Hadoop task mappers to the number of scan segments #143

Closed julienrf closed 1 month ago

julienrf commented 1 month ago

Also:

Fixes #130

guy9 commented 1 month ago

Thanks @julienrf. @pdbossman , please review

pdbossman commented 1 month ago

I concur with the comments above, that 100MB is commonly used to determine partition size, and the ability to set scanSegments gives the user the control they need.

What I'm really waiting for is @GeoffMontee successful test with large number of partitions on our large dataset.