scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra/parquet files. Alt. from DynamoDB to Scylla Alternator.
https://migrator.docs.scylladb.com/stable/
Apache License 2.0
55 stars 34 forks source link

How to configure batching or throughput for writing to Scylla. #57

Open kumar-shrey opened 2 years ago

kumar-shrey commented 2 years ago

We have some schedules where we want to copy some data from parquet files to Scylla cluster. We were using custom code blocks to achieve this, we are exploring scylla-migrator for the same. And upon experiment found scylla-migrator performing a lot more ops/sec on scylla for same amount of data(almost 10x). I m guessing it might be due to the size of batches that we used to create earlier vs the batching strategy used by scylla-migrator.

We want to keep the ratio of read/write ops on scylla same as before due to very tight read latency SLA of the system.

Is there a way to configure batch size or throughput for writing to scylla.