remind101 / stacker_blueprints

DEPRECATED - moved to:
https://github.com/cloudtools/stacker_blueprints
BSD 2-Clause "Simplified" License
39 stars 53 forks source link

Add support for DynamoDB Snapshot blueprint #21

Closed mwildehahn closed 7 years ago

mwildehahn commented 8 years ago

The DynamoDB Snapshot blueprint configures an AWS Data Pipeline that uses an EMR cluster to export a DynamoDB table to an S3 Bucket.

Example config:

snapshot_params: &snapshot_params
  PipelineLogUri: 's3://db-backups/logs'
  ResourceRole: 'DataPipelineDefaultResourceRole'
  Role: 'DataPipelineDefaultRole'
  Activate: 'true'
  SchedulePeriod: '1 day'
  ScheduleType: 'cron'
  StartDateTime: '2016-06-01T02:00:00'

stacks:
  - name: table-1-snapshot
    class_path: blueprints.dynamodb_snapshot.DynamodbSnapshot
    parameters:
      << : *snapshot_params
      S3OutputLocation: 's3://db-backups/${table_1_name}'
      TableName: ${table_1_name}
      S3OutputLocation: 's3://db-backups/${table_1_name}'
  - name: table-2-snapshot
    class_path: stacker_blueprints.dynamodb_snapshot.DynamoDBSnapshot
    parameters:
      << : *snapshot_params
      S3OutputLocation: 's3://db-backups/${table_2_name}'
      TableName: ${table_2_name}
      S3OutputLocation: 's3://dev-bot-backups/${table_2_name}'
mwildehahn commented 8 years ago

This works well for single tables, but if you're trying to snapshot multiple tables, it will bring up instances for each table, which sucks.

phobologic commented 8 years ago

Should we pull this in, or wait and update for blueprint variables?

mwildehahn commented 8 years ago

Mine as well just wait. This only works well for 1 or 2 tables, each time it runs for a table it brings up by default 3 instances to run an EMR job. I'm not sure if you can get an AWS data pipeline job to run on more than 1 table at a time. On Fri, Sep 2, 2016 at 10:52 AM Michael Barrett notifications@github.com wrote:

Should we pull this in, or wait and update for blueprint variables?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/remind101/stacker_blueprints/pull/21#issuecomment-244444004, or mute the thread https://github.com/notifications/unsubscribe-auth/AArAUQOCxpSGbUDCPekdOZ4nBZsotifjks5qmGJcgaJpZM4I05H4 .

phobologic commented 8 years ago

Ok cool, I'll hold off.

mwildehahn commented 7 years ago

Ok, I updated this to take an array of dynamodb tables so that it will bring up 1 EMR cluster and then run several EMR activities (1 per table you're backing up), which is significantly better than the first implementation which brought up 1 cluster per table.

Sample config would be:

  - name: schema
    class_path: stacker_blueprints.dynamodb.DynamoDB
    variables:
      Tables:
        TestTable1:
            AttributeDefinitions:
              - AttributeName: created
                AttributeType: S
            KeySchema:
              - AttributeName: created
                KeyType: HASH
            ProvisionedThroughput:
              ReadCapacityUnits: 1
              WriteCapacityUnits: 1
        TestTable2:
            AttributeDefinitions:
              - AttributeName: created
                AttributeType: S
              - AttributeName: test
                AttributeType: S
            KeySchema:
              - AttributeName: created
                KeyType: HASH
              - AttributeName: test
                KeyType: RANGE
            ProvisionedThroughput:
              ReadCapacityUnits: 1
              WriteCapacityUnits: 1
  - name: schemaSnapshots
    class_path: stacker_blueprints.dynamodb.snapshot.Snapshot
    variables:
      Activate: true
      PipelineLogUri: s3://stacker-mh-empire-dev/logs
      ResourceRole: 'DataPipelineDefaultResourceRole'
      Role: 'DataPipelineDefaultRole'
      SchedulePeriod: '1 day'
      ScheduleType: 'cron'
      StartDateTime: '2016-10-17T02:00:00'
      SnapshotConfigs:
        - TableName: ${schema::TestTable1Name}
          S3Output: s3://stacker-mh-empire-dev/backups/${schema::TestTable1Name}
        - TableName: ${schema::TestTable2Name}
          S3Output: s3://stacker-mh-empire-dev/backups/${schema::TestTable2Name}
phobologic commented 7 years ago

Going to review some more, but wanted to make sure I submitted that.