An ansible playbook is provided in ansible folder. The ansible playbook will install the pre-requisites, spark, on the master and workers added to the ansible/inventory/hosts
file. Scylla-migrator will be installed on the spark master node.
ansible/inventory/hosts
file with master and worker instancesansible/ansible.cfg
with location of private key if necessaryansible/template/spark-env-master-sample
and ansible/template/spark-env-worker-sample
contain environment variables determining number of workers, CPUs per worker, and memory allocations - as well as considerations for setting them.ansible-playbook scylla-migrator.yml
./start-spark.sh
./start-slave.sh
config.yaml
based whether you're performing a migration to CQL or Alternator
config.yaml.example
, and edit as directed.config.dynamodb.yml
, and edit as directed.scylla-migrator/submit-cql-job.sh
, change line --conf spark.scylla.config=config.yaml \
to point to the whatever you named the config.yaml in previous step.scylla-migrator/submit-alternator-job.sh
, change line --conf spark.scylla.config=/home/ubuntu/scylla-migrator/config.dynamodb.yml \
to reference the config.yaml file you created and modified in previous step../submit-cql-job.sh
./submit-alternator-job.sh
Create a config.yaml
for your migration using the template config.yaml.example
in the repository root. Read the comments throughout carefully.
The Scylla Migrator is built against Spark 3.5.1, so you'll need to run that version on your cluster.
Download the latest release of the migrator:
wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar
Alternatively, you can build a custom version of the migrator.
Copy the jar scylla-migrator-assembly.jar
and the config.yaml
you've created to the Spark master server.
Start the spark master and slaves.
cd scylla-migrator
./start-spark.sh
On worker instances:
./start-slave.sh
Configure and confirm networking between:
Create schema in target server.
Then, run this command on the Spark master server:
spark-submit --class com.scylladb.migrator.Migrator \
--master spark://<spark-master-hostname>:7077 \
--conf spark.scylla.config=<path to config.yaml> \
<path to scylla-migrator-assembly.jar>
If you pass on the truststore file or ssl related files use --files
option:
spark-submit --class com.scylladb.migrator.Migrator \
--master spark://<spark-master-hostname>:7077 \
--conf spark.scylla.config=<path to config.yaml> \
--files truststorefilename \
<path to scylla-migrator-assembly.jar>
This project also includes an entrypoint for comparing the source table and the target table. You can launch it as so (after performing the previous steps):
spark-submit --class com.scylladb.migrator.Validator \
--master spark://<spark-master-hostname>:7077 \
--conf spark.scylla.config=<path to config.yaml> \
<path to scylla-migrator-assembly.jar>
To run in the local Docker-based setup:
First start the environment:
docker compose up -d
Launch cqlsh
in Cassandra's container and create a keyspace and a table with some data:
docker compose exec cassandra cqlsh
<create stuff>
Launch cqlsh
in Scylla's container and create the destination keyspace and table with the same schema as the source table:
docker compose exec scylla cqlsh
<create stuff>
Edit the config.yaml
file; note the comments throughout.
Run build.sh
.
Then, launch spark-submit
in the master's container to run the job:
docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \
--master spark://spark-master:7077 \
--conf spark.driver.host=spark-master \
--conf spark.scylla.config=/app/config.yaml \
/jars/scylla-migrator-assembly.jar
The spark-master
container mounts the ./migrator/target/scala-2.13
dir on /jars
and the repository root on /app
. To update the jar with new code, just run build.sh
and then run spark-submit
again.
To test a custom version of the migrator that has not been released, you can build it yourself by cloning this Git repository and following the steps below:
sbt
are installed on your machine.JAVA_HOME
environment variable with the path to the
JDK installation.build.sh
.spark-submit
command at path migrator/target/scala-2.13/scylla-migrator-assembly.jar
.