mapbox / dynamodb-replicator

module for dynamodb multi-region replication
ISC License
129 stars 48 forks source link

incremental-snapshot doesnt handle s3 timeouts well #87

Open keen99 opened 7 years ago

keen99 commented 7 years ago

incremental-snapshot.js doesnt seem to handle s3 timeouts very well - leaving a broken (partial, missing, or otherwise) snapshot in it's wake:

bin/incremental-snapshot.js s3://$BackupBucket/$BackupPrefix/$TABLE s3://$BackupBucket/${TABLE}-snapshot

[Tue, 10 Jan 2017 17:12:49 GMT] [info] [incremental-snapshot] Starting snapshot from s3://dsr-ddb-rep-testing/testprefix/showdownlive_gamedata_dev-01 to s3://dsr-ddb-rep-testing/showdownlive_gamedata_dev-01-snapshot
[Tue, 10 Jan 2017 17:12:59 GMT] [info] [incremental-snapshot] Starting upload of part #0, 0 bytes uploaded, 3000 items uploaded @ 297.65 items/s
[Tue, 10 Jan 2017 17:13:06 GMT] [error] [incremental-snapshot] TimeoutError: Connection timed out after 1000ms
    at ClientRequest.<anonymous> (/Users/draistrick/git/github/dynamodb-replicator/node_modules/aws-sdk/lib/http/node.js:56:34)
    at ClientRequest.g (events.js:286:16)
    at emitNone (events.js:86:13)
    at ClientRequest.emit (events.js:185:7)
    at TLSSocket.emitTimeout (_http_client.js:614:10)
    at TLSSocket.g (events.js:286:16)
    at emitNone (events.js:91:20)
    at TLSSocket.emit (events.js:185:7)
    at TLSSocket.Socket._onTimeout (net.js:333:8)
    at tryOnTimeout (timers.js:228:11)
    message: Connection timed out after 1000ms
    code: NetworkingError
    region: us-west-2
    hostname: dsr-ddb-rep-testing.s3-us-west-2.amazonaws.com

this case also exits 0, instead of with an error...so hard to handle externally

keen99 commented 7 years ago

bumping the timeout in https://github.com/mapbox/dynamodb-replicator/blob/master/s3-snapshot.js#L18 to something more sane (10s instead of 1s) and adding --retries 100 to the command line seems to at least work around this. but the default handling of these is definitely a bit painful. :)