scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra/parquet files. Alt. from DynamoDB to Scylla Alternator.
https://migrator.docs.scylladb.com/stable/
Apache License 2.0
61 stars 35 forks source link

The correctness of the Ansible playbook is not covered by the CI tests #170

Open julienrf opened 4 months ago

julienrf commented 4 months ago

Our tests do not cover the Ansible playbook. The requirements to run the playbook are not clear, and any modification to the playbook may silently break it.

julienrf commented 2 months ago

For the record, here is how I locally tested the Ansilbe playbook. The information below could be used to implement an automated workflow.

  1. Install Ansible on my machine
  2. Create Docker containers: a Spark master node, a worker node, and DynamoDB, using the following docker-compose.yaml:

    services:
     master:
       build: dockerfiles/ansible
     worker:
       build: dockerfiles/ansible
    
     dynamodb:
       command: "-jar DynamoDBLocal.jar -sharedDb -inMemory"
       image: "amazon/dynamodb-local:latest"
       expose:
         - 8000
       ports:
         - "8000:8000"
       working_dir: /home/dynamodblocal

    It uses the following Dockerfile (at ./dockerfiles/ansible/Dockerfile), which sets up an SSH server in an Ubuntu OS:

    FROM ubuntu
    
    RUN apt-get update && apt-get install -y openssh-server sudo software-properties-common iproute2
    
    RUN mkdir /var/run/sshd
    
    RUN useradd -ms /bin/bash ubuntu \
       && echo 'ubuntu:aaaaaa' | chpasswd \
       && sudo adduser ubuntu sudo \
       && echo "ubuntu ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/ubuntu
    
    RUN echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config \
       && echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
    
    EXPOSE 22
    
    CMD ["/usr/sbin/sshd", "-D"]
  3. Start the containers with docker compose up
  4. Find the IP addresses of the master and worker services with a command like:
    docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' <container-name>
  5. Edit ansible/inventory/hosts.ini and set the IP address of the master and worker hosts. Disable the spark_worker2 host.
  6. Comment out private_key_file in ansible/ansible.cfg
  7. Run ansible-playbook scylla-migrator.yml
  8. When the process completes, open a shell in the master node and start Spark:
    docker compose exec master /bin/bash
    cd scylla-migrator
    ./start-spark.sh
  9. Start the Spark worker on the worker node:
    docker compose exec worker /bin/bash
    ./start-slave.sh
  10. From the master node, create a DynamoDB source table and put an item in it:
    aws configure set region us-west-1
    aws configure set aws_access_key_id dummy
    aws configure set aws_secret_access_key dummy
    aws \
     --endpoint-url http://dynamodb:8000 \
     dynamodb create-table \
       --table-name Source \
       --attribute-definitions AttributeName=id,AttributeType=S \
       --key-schema AttributeName=id,KeyType=HASH \
       --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100
    aws \
     --endpoint-url http://dynamodb:8000 \
     dynamodb put-item \
     --table-name Source \
     --item '{ "id": { "S": "foo" } }'
  11. From the master node, edit the file scylla-migrator/config.dynamodb.yml:
    source:
     type: dynamodb
     table: Source
     endpoint:
       host: http://dynamodb
       port: 8000
     credentials:
       accessKey: empty
       secretKey: empty
    target:
     type: dynamodb
     table: Target
     endpoint:
       host: http://dynamodb
       port: 8000
     credentials:
       accessKey: empty
       secretKey: empty
     streamChanges: false
    savepoints:
     path: /app/savepoints
     intervalSeconds: 300
  12. Start the migration from the master node:
    ./submit-alternator-job.sh