scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra/parquet files. Alt. from DynamoDB to Scylla Alternator.
https://migrator.docs.scylladb.com/stable/
Apache License 2.0
61 stars 36 forks source link

Align the scaling strategy of the Ansible-based setup with the other setups #201

Closed julienrf closed 2 months ago

julienrf commented 2 months ago

Fixes #192.

julienrf commented 2 months ago

I tested these changes by creating two Ubuntu containers with an SSH server, and a DynamoDB instance:

services:
  master:
    build: dockerfiles/ansible
  worker:
    build: dockerfiles/ansible

  dynamodb:
    command: "-jar DynamoDBLocal.jar -sharedDb -inMemory"
    image: "amazon/dynamodb-local:latest"
    expose:
      - 8000
    ports:
      - "8000:8000"
    working_dir: /home/dynamodblocal

Where, dockerfiles/ansible/Dockerfile is the following:

FROM ubuntu

RUN apt-get update && apt-get install -y openssh-server sudo software-properties-common iproute2

RUN mkdir /var/run/sshd

RUN useradd -ms /bin/bash ubuntu \
    && echo 'ubuntu:aaaaaa' | chpasswd \
    && sudo adduser ubuntu sudo \
    && echo "ubuntu ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/ubuntu

RUN echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config \
    && echo "PermitRootLogin yes" >> /etc/ssh/sshd_config

EXPOSE 22

CMD ["/usr/sbin/sshd", "-D"]

I noted the IP addresses of the Spark master and worker nodes with the following commands:

docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' scylla-migrator-worker-1
docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' scylla-migrator-master-1

And I used those IP addresses in the Ansible inventory.

Then I ran ansible-playbook scylla-migrator.yml to set up the Migrator on both the Spark master and worker nodes.

Afterwards, I opened a terminal on both nodes to run start-spark.sh and start-slave.sh.

I created a DynamoDB table and put an item in it. Then, I edited the file dynamodb.config.yml to configure a migration from this table. Finally, I executed the migration with submit-alternator-job.sh.

guy9 commented 2 months ago

@pdbossman please review