taoensso / faraday

Amazon DynamoDB client for Clojure
https://www.taoensso.com/faraday
Eclipse Public License 1.0
238 stars 84 forks source link

Support for asynchronous Java client #38

Closed moea closed 9 years ago

moea commented 9 years ago

I have a need for a Clojure interface to Amazon's asynchronous Dynamo client, and Faraday is in written in such a way that it would be relatively easy to support this.

I've forked, and started factoring all of the request-construction logic into separate functions, and was planning on defining a protocol containing only the top-level I/O stuff and providing asynchronous implementations which share more-or-less everything with the existing, synchronous ones.

I'm curious about how likely you'd be to accept a PR, assuming API compatibility is entirely maintained, or whether this falls outside of the scope of what you're trying to do with the library.

ptaoussanis commented 9 years ago

Hi Moe,

I'm not familiar with the AWS async DynamoDB client. Could you explain a little what benefit you'd get from wrapping the async client over, say, just using Clojure's own async capabilities like core.async?

Would definitely be open to a non-breaking PR if it brought something valuable.

Cheers! :-)

tlipski commented 9 years ago

I think non-blocking I/O with DynamoDB would be highly beneficial - e.g. see here Martin Trojer's blog post on core.async+netty http://martintrojer.github.io/clojure/2013/07/07/coreasync-and-blocking-io/.

ptaoussanis commented 9 years ago

@tlipski Hi Tomek, what makes you think you can't use core.async with the current DynamoDB client?

tlipski commented 9 years ago

We can - but it will block a thread, right?

ptaoussanis commented 9 years ago

Well, a logical thread. core.async lets you decide whether you'd like to spend an actual system thread, or a go "thread" (parking FSM).

The async DynamoDB client appears to just operate with futures, so actually gives you fewer options from what I can tell (I may be missing something, just skimmed the docs).

Keep in mind that one of the nice things about core.async is that it lets you make any synchronous activity asynchronous - and in a particularly pleasant and flexible way.

Does that make sense?

tlipski commented 9 years ago

Well, I think you need to have a separate thread if your code uses java.io underneath (and that's what is being in AWS sync client I think). But maybe I am missing something obvious - can you show some code example where no threads are blocked during communication between the JVM and AWS endpoint?

moea commented 9 years ago

The Martin Trojer article is a good overview.

As Tomek said, you need an actual system thread on which to perform the I/O. The one thing in core.async which is useful for a case like this, is thread, in so far as it allows you to retrieve the result without blocking the consumer, but it uses an unbounded thread pool / doesn't give you any control of how operations are scheduled. Using it as a general mechanism for handling blocking I/O would be a mistake.

In addition to futures, the async Dynamo API allows callbacks to be provided, which means no deref-ing/blocking in the caller. Separately, I don't necessarily think channels are the greatest fit for many APIs, because of their focus on the streaming of multiple values, and poorly thought-through error handling. I was considering using Manifold for result delivery, because it's lightweight, and has little to no conceptual overhead. I'm curious to know what you think.

ptaoussanis commented 9 years ago

Well, I think you need to have a separate thread if your code uses java.io underneath (and that's what is being in AWS sync client I think).

@tlipski Sure, in the case of go blocks, you'll be blocking one of the threads in the pool that services the go state machine. You'd need to be careful doing that, which is why explicit threads are normally recommended for i/o. I believe it's the intention at some point for go to issue warnings against blocking i/o calls.

@moea

In addition to futures, the async Dynamo API allows callbacks to be provided, which means no deref-ing/blocking in the caller.

Had missed that - thought they were just providing futures/promises for everything. Could you show me what docs you're looking at?

I was considering using Manifold for result delivery, because it's lightweight, and has little to no conceptual overhead.

I wouldn't be keen on Manifold in Faraday, sorry. But my guess is that we'd probably be fine with just futures+promises, possibly some core.async for paged requests if we wanted to expose a streaming view on long-running jobs.

Let's start by making sure that we're looking at the same docs and go from there? All I could find was this, but it's deprecated: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodb/AmazonDynamoDBAsyncClient.html

Would also want to check their implementation to make sure we'd actually be getting some real benefit with the callback interface over just letting clients use futures/promises/core.async as they like.

Does that make sense?

Thanks a lot for the input, much appreciated! :-)

ptaoussanis commented 9 years ago

Ahh, found the v2 docs: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/package-summary.html

moea commented 9 years ago

@ptaoussanis Thanks for the response. I think promise and future aren't a good fit for pervasively asynchronous applications, because they cannot be realized without blocking. They would provide the option of having a very thin (third-party) wrapper around Faraday which could convert them into Manifold-like deferred objects, but this would require even more processing in threads (by Manifold).

It would be preferable, I think, to either return a core.async channel from every function, or just to accept continuations in the I/O-performing functions. Either of those could be converted into something higher-level downstream in a non-blocking way.

Alternatively, the PR could just cover the defprotocol & the factoring of the request-object generating methods, and the alternate API could live somewhere else.

ptaoussanis commented 9 years ago

It would be preferable, I think, to either return a core.async channel from every function, or just to accept continuations in the I/O-performing functions.

If the DDB async API can accept callbacks, then I'd maybe suggest we just accept callbacks ourselves? Then folks can decide for themselves what async architecture they'd like to employ (be it delivering a promise, put!ing to a core.async channel, etc.) This'd also let us avoid picking up any extra dependencies.

Alternatively, the PR could just cover the defprotocol & the factoring of the request-object generating methods, and the alternate API could live somewhere else.

My hands are quite full atm; would be happy to leave this to your judgement.

ptaoussanis commented 9 years ago

Closing this for now, we can open a fresh issue later if you (or someone else) would like to bring an async API directly to Faraday.