taoensso / faraday

Amazon DynamoDB client for Clojure
https://www.taoensso.com/faraday
Eclipse Public License 1.0
238 stars 84 forks source link

Unprocessed items in wrong format #16

Closed ulsa closed 10 years ago

ulsa commented 10 years ago

The unprocessed items returned after a throttled batch-write is in a mixed format, it seems:

{:unprocessed #<HashMap {
  mytable=
    [{PutRequest: 
      {Item: {id={S: "808b2e40e",}, 
              uuid={S: d1fe6a7e-c80d-41dc-a14d-afe848fe9b71,}}},
  }
  ...
}

If I try to send the value of :unprocessed to batch-write again, it fails with a not very illuminating ClassCastException:

NullPointerException   clojure.lang.Reflector.invokeNoArgInstanceMember (Reflector.java:296)
java.lang.ClassCastException: null
 at 
ptaoussanis commented 10 years ago

Hi Ulrik, thanks for the report!

On a scale of 1-10, how urgent is this? Unless it's very urgent, I can try take a look in the next day or two? Just juggling some high-priority tasks atm. If you need this ASAP I'll make sure to look into it today.

ulsa commented 10 years ago

Actually, it's very urgent. I need to perform a migration in production and this is biting me. If you could have a look, that would be awesome.

ptaoussanis commented 10 years ago

Okay, could you get me some more details - incl. an example of the code you're running?

The unprocessed items returned after a throttled batch-write is in a mixed format

The :unprocessed value isn't generally something you'd be touching when using Faraday - it's in the raw Java format documented here: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/model/BatchWriteItemResult.html#getUnprocessedItems()

The reason you wouldn't normally be using this value directly is because batch-write-item already handles stitching together requests for you via the :span-reqs option (this is described in the docstring / API docs).

So to clarify: are you using batch-write-item's :span-reqs option and it's malfunctioning somehow, or you're trying to do manual spanning? If you're doing manual spanning, could you explain why?

If you do need to do manual spanning, you can see the batch-write-item and merge-more source to see how it's done there.

The more clear info you can give me, the more likely I'll be able to help quickly.

Cheers!

ulsa commented 10 years ago

I haven't used :span-reqs. Let me try with that first.

I seem to have a hard time parsing the doc-strings into examples. :)

ptaoussanis commented 10 years ago

I haven't used :span-reqs. Let me try with that first. I seem to have a hard time parsing the doc-strings into examples. :)

Okay, great - that's probably what you want then. If you previously had (batch-write-item my-creds my-req), now you'll want something like (batch-write-item my-creds my-req {:span-reqs {:max 100 :throttle-ms 10}}).

This'll instruct the batch-write to stitch together up to 100 individual requests, pausing for 10msecs between each request. Note that an op like this can take quite some time to complete unless you have high throughput limits set.

ulsa commented 10 years ago

I tried with a lower setting on :max, 10, and then I still got some unprocessed items. When I increased it to 100, however, it all eventually went through even on a very small write provisioning.

So what do I do if I still get unprocessed items? It would be nice if there was a way to access lower-level utilities, like db-client, if I briefly need to get under the hood and access the Java API. Or a way to convert the unprocessed map to something that batch-write accepts. Or, better yet, unprocessed is automatically converted to the Clojure format.

ptaoussanis commented 10 years ago

It would be nice if there was a way to access lower-level utilities, like db-client, if I briefly need to get under the hood and access the Java API.

Nothing stops you from doing that: the entire, standard Java API is fully accessible and interoperable. db-client, for example, just returns a standard AmazonDynamoDBClient object - you're free to bang on it with the standard Java API.

Likewise the :unprocessed-items value is just a standard Java-API object, you could work with it directly (nothing stops you, it's not wrapped or anything) - but you will need to use the Java API which can be a real pita.

Or a way to convert the unprocessed map to something that batch-write accepts. Or, better yet, unprocessed is automatically converted to the Clojure format.

Sure, would be open to a PR for this. Haven't given it much thought, so can't recommend one approach over the other.

ulsa commented 10 years ago

Sure, everything is available. I'm mainly trying to avoid that my code has to import all the Java classes and perform all the mapping as well. If it's possible to get a nice solution just by making some private functions public, I'll let you know.

ptaoussanis commented 10 years ago

I'm mainly trying to avoid that my code has to import all the Java classes and perform all the mapping as well.

There's two different things we're discussing here though:

  1. Ability to access the Java API.
  2. Mitigating the need to access the Java API by providing all the tools we might want in Clojure-space.

We already have 1. There's definitely still room for improvement on 2 (here for example, by modding either the :unprocessed-items form and/or batch-write-item to make it easier to do manual request spanning).

If it's possible to get a nice solution just by making some private functions public

Nothing obvious comes to mind in this case - I'd prefer to keep merge-more private since it makes quite a few assumptions about the input/output form and those may change in future.

Your earlier suggestion sounds good to me: I would investigate tweaks to :unprocessed-items and/or batch-write-item to make them cooperate more easily in cases where folks don't want to use the automatic request spanning.

ulsa commented 10 years ago

I just want to point out that I in fact did use request spanning and still got unprocessed items in some cases.