silvermine / dynamodb-table-sync

MIT License
45 stars 15 forks source link

Silvermine DynamoDB Table Sync

Build Status Coverage Status Dependency Status Dev Dependency Status

What is it?

A script that will scan two or more DynamoDB tables and report the differences to you. Optionally, it will also synchronize them by writing updates to the slave table to keep it in sync with the master table.

How do I use it?

Here's an example of how you can use this script:

node src/cli.js \
   --master us-east-1:my-dynamodb-table \
   --slave us-west-2:my-dynamodb-table \
   --slave eu-west-1:my-dynamodb-table \
   --write-missing \
   --write-differing

Using the arguments shown above, the synchronizer would scan your master table (the table in the us-east-1 region), and check that every item in the master table exists in each of your two slave tables (in us-west-2 and eu-west-1 in this example). Because we supplied both the --write-missing and --write-differing flags, it would write to the slaves any items that the slave was missing or where the slave's item differed from the master's item.

Running in Docker

Build the image:

docker-compose build

Then run the container:

docker-compose run \
   --rm \
   --user node \
   --volume "${HOME}/.aws:/home/node/.aws"
   -e AWS_ACCESS_KEY_ID="XXXXX" \
   -e AWS_SECRET_ACCESS_KEY="ZZZZZ" \
   dynamodb_table_sync \
   src/cli.js \
   --master us-east-1:my-dynamodb-table \
   --slave us-west-2:my-dynamodb-table \
   --slave eu-west-1:my-dynamodb-table \
   --write-missing \
   --write-differing

Installing Globally

By installing the library globally (e.g. npm install -g @silvermine/dynamodb-table-sync), you will get a dynamodb-table-sync executable in your node bin folder. Assuming you have the node bin folder in your path (you probably do if you've ever installed any other npm package globally), then you can simply run the command from any folder like this:

dynamodb-table-sync -m us-east-1:my-dynamodb-table -s eu-west-1:my-dynamodb-table

Of course, you can use any CLI arguments you want to regardless of where you run the script from.

Using Silvermine DynamoDB Table Sync as a Library

You can also use this codebase as a library. Here's a brief example of how to do so:

var Synchronizer = require('@silvermine/dynamodb-table-sync'),
    synchronizer;

synchronizer = new Synchronizer(
   { region: 'us-east-1', name: 'my-master-table-name' },
   [
      { region: 'us-west-2', name: 'my-slave-1-table-name' },
      { region: 'eu-west-1', name: 'my-slave-2-table-name' },
   ],
   {
      writeMissing: true,
      writeDiffering: true,
   },
);

synchronizer.run()
   .then(() => {
      console.info('Done!');
   })
   .catch((err) => {
      console.error(`Failed: ${err.message}`, err.stack);
      process.exit(1);
   });

See the comments in [src/Synchronizer.js] for more documentation on names of options. Since options all map to CLI arguments, see the list of CLI arguments below for details on what each option is used for.

Command Line Flags

Note that everywhere that a table is supplied as a command line argument, it should be in the form <region>:<table-name>.

Syncing Tables in Two Accounts

It is possible to synchronize tables in two accounts. To do so, you must provide credentials arguments that configure the credentials providers to be used for the slave tables. This is done by providing one of the following combinations of arguments:

By configuring credentials for the slaves using these arguments, the script will use the master credentials (provided by default, or by the --profile, etc args) to describe and read from the master table, and the slave credentials to describe, read from, and write to the slave table(s).

See the heading below entitled "Authentication and Authorization to AWS DynamoDB API" for more information related to credentials.

"Dry Run" Mode

If you run the synchronizer without any of the modification flags (--write-missing, --write-differing, and --delete-extra), then the script will run in a dry run / report-only mode. It will log each item that is different or missing, and will report stats at the end of the run.

As noted above, the dry run will (by default) only scan the master table - finding items that are missing from the slaves or where the slave item differs from the master. If you want to also scan the slave tables to find items in them that do not exist in the master ("extra" items), supply the --scan-for-extra flag.

Note About Race Conditions

There is no way to make the multi-region/multi-table operation atomic. Thus, due to the time between where we read and write from various tables, there are race conditions that will exist.

For example, if you are replicating data from the master table to the slave table(s), it may be that we read an item that has not yet replicated to the slave. Or, it's possible that we read a new version of the item from the master, and an old version of the item from the slave(s). In either of these scenarios, you are "safe" with the --write-missing and --write-differing flags because the synchronizer will write what it read from the master, which in both of these scenarios is the newer data.

Of course, there are scenarios that are not safe as well. Consider, for example, the following scenario - portrayed as a serial list of events:

What can be done to avoid that scenario?

Authentication and Authorization to AWS DynamoDB API

The script can run without any specific AWS credentials configuration (for example, without --profile, --role-arn or the MFA-related arguments). In those cases the script will simply use the built-in authentication mechanism from the SDK. Thus, you will need to ensure that one of the methods that the SDK uses to auto-discover credentials will work in your environment. For example, you could:

Note that if you assume a role that requires MFA, the temporary credentials the script receives will only last for one hour, and cannot be refreshed without a new MFA token (which the script does not attempt to implement). Thus, the provisioned capacity, parallelism, and related limiting arguments must be configured to allow your entire table to be scanned during a single hour if you are using MFA.

What Permissions Are Needed?

On the master table you will need:

On the slave table(s) you will need:

How do I contribute?

We genuinely appreciate external contributions. See our extensive documentation on how to contribute.

License

This software is released under the MIT license. See the license file for more details.