mapbox / dyno

simple dynamodb client
MIT License
78 stars 28 forks source link

[wip] Managing batch get/write responses #108

Closed rclark closed 8 years ago

rclark commented 8 years ago

The potential for UnprocessedItems in batch responses is tricky, especially if you need to make a set of requests. This branch sketches out consolidating some of the logic in dyno. The idea is:

  1. dyno.batchWriteItemRequests(params) lets you hand in > 100 items to write. The function returns an array of unsent requests with a .sendAll() method attached.
  2. The caller can then handle sending the requests themselves, or
  3. use .sendAll([concurrency], callback) to ask dyno to send the requests. Concurrency defaults to 1. This always sends all the requests before firing the callback function, even if one of the requests fails.
  4. The callback function fires with three arguments:
    • err: null if there were no errors, otherwise an array of error objects with the same length as the number of requests that were sent. If a request succeeded, its index in the error array will be undefined
    • data: an array of responses to successful requests. If any requests failed, their index in the array will be undefined
    • unprocessed: undefined if there were no unprocessed items, otherwise another array of unsent requests with a .sendAll() function attached.

Important to keep in mind that UnprocessedItems are returned as part of successful requests that don't throw any errors. I think this flow keeps things separated in a way that makes sense. Usage might look something like this:

var requestSet = dynamodb.dyno.batchWriteItemRequests(params);

(function sendRequests(requests) {
  requests.sendAll(10, function(err, responses, unprocessed) {
    if (err) {
      // "Handling" errors could mean retrying requests
      err.forEach(function(error, i) {
        var failedRequest = requestSet[i];
        failedRequest.send(retryHandlingFn);
      });
    }

    if (unprocessed) return sendRequests(unprocessed);
    console.log('everything was done successfully');
  });
})(requestSet);

Penny for your thoughts on this @mick @willwhite @jakepruitt. I ran into this right away as I started investigating using dyno v1 in downstream apps.

jakepruitt commented 8 years ago

I really like this!!! This will be so nice in the future. I imagine we will almost always want to retry unprocessed items, just like your example shows. Could there be another method besides sendAll like sendAllForce that automatically incorporates the retry behavior? That way we could avoid copy-pasting the retry code throughout the codebase. If our goal is to eventually phase out dyno and rely on the aws-sdk directly, it may not matter whether we abstract away the retry behavior or not.

willwhite commented 8 years ago

:+1: concept sounds great - main issue in the past has been async-queue's error behavior of returning the first error only. Do you workaround this because unprocessed items are not really DDB errors?

rclark commented 8 years ago

@willwhite yeah: see code here that makes it impossible for queue-async's callbacks to ever be fired with an error object & stop processing.

@jakepruitt I agree that it might be a bummer to have downstream apps rewrite a lot of "retry unprocessed" code. That said my first moral compass bearing is to draw the line in dyno that it never retries requests for you. The present sendAll implementation makes it obvious to the caller how many requests will be made.

btw the example code above is a bad example because there's no limit to the number of times it may retry unprocessed items (if somehow they were to keep coming back unprocessed every time).

rclark commented 8 years ago

@jakepruitt I'm going to merge this now and keep trying to use v1 in downstream apps. I'm still not opposed to a sort of "force" mode, but I want to try avoiding it a little while longer.