taoensso / faraday

Amazon DynamoDB client for Clojure
https://www.taoensso.com/faraday
Eclipse Public License 1.0
238 stars 84 forks source link

Expose request-building logic publicly #39

Closed moea closed 9 years ago

moea commented 9 years ago

WRT discussion on #38, I figured it would make sense to do the async API stuff outside of Faraday, as I went ahead and implemented an NIO Dynamo client here (in Java), so the dependencies would be awkward for my particular use-case. I'm more than happy to provide an API within Faraday which uses the AWS thread-based async client + core.async, if that's something you're interested in

This PR only refactors the request building stuff from the top-level Faraday API into public functions, which would allow the implementation of an asynchronous Clojure client with a minimum of code duplication. I was talking before about defining a protocol for the API functions, but after some more thought I don't know how good a fit this is, and exposing the request logic is 95% of the work.

Notes:

I'm aware that this PR represents more of a liability than a benefit for you, and of course am more than willing to do anything further to convince you that it doesn't introduce any problems.

ptaoussanis commented 9 years ago

Hi Moe, thanks for this.

I've just pulled your PR into a moea branch where I can take a better look. You've got a lot of changes here, and I'm a pretty short on time atm - so it'd help if you could clarify the high-level objectives a little.

As I understand it (please correct any misunderstandings):

  1. You want a high-performance async interface to DDB from Clojure.
  2. You're dissatisfied with the official AWS async DDB client since it's backed by a thread pool, and you've benched the performance and found it unsatisfactory.
  3. You've written an alternative async DDB client in Java using Apache's HttpAsyncClient, and you've benched the performance and found it satisfactory.
  4. This PR is intended to expose Faraday functionality that's currently private, and that you're expecting to need later on.

Some things I'm still unclear on:

  1. What is the end goal as far as Faraday is concerned? Faraday picks up async support out-the-box, via your client? Some aspect of Faraday becomes pluggable to allow cooperation with your async client and/or the AWS async client? Does anything break for current users who are satisfied with the synchronous behaviour?
  2. What is the high-level goal of this particular PR? You mention that you want some functionality to be made public - would it not have been possible to just make those fns public without restructuring namespaces, etc.?
  3. You added a number of tests - were those motivated by any errors you encountered, or just to help you refactor w/o breaking anything?

Keep in mind that the smaller + easier-to-digest the PR, the quicker+easier I can merge it. The current version touches a lot of code, and it's not clear to me while scanning which of these changes are aesthetic and which are crucial to your goals.

Something that might be immediately digestible could look like: Commit 1: Make these fns public. Commit 2: Separate this fn into two, because we need the functionality to be orthogonal. Commit 3: Restructure namespaces for [purpose]. Commit 4: Add tests for [purpose]. Commit 5: Formatting changes.

Does that make sense? Thanks again for all your effort on this!

Cheers :-)

moea commented 9 years ago

Peter,

Thanks for your time. The stated motivations are all correct. On the clarifications:

  1. I don't think it's practical to hook up my Java Dynamo client to Faraday without providing alternate implementations of Faraday's API (i.e. reimplementing get-item, put-item, etc. - however short they are - somewhere else). The Java methods in my client are identically named to the AWS ones, but the client can't share an interface with the AWS stuff, as all have void return types and accept callbacks. My approach to this is to make a separate Clojure project which depends only on the request generation from Faraday, and distance it, name wise, to whatever degree you're comfortable with (e.g. completely unrelated project which happens to expose an API which users may be comfortable with). It seems perverse to me to burden users with an additional data abstraction over the Dynamo Java API - the stuff in Faraday is pretty natural and intuitive, and ideally I'd like to be able to use it.
  2. It would certainly be possible to fulfil the goal (which is basically just exposing the functionality I put into requests.clj) without adding any files. The decision to move things around was slightly more compelling when I was planning on providing support for the AWS Dynamo async client within Faraday, and it seemed unfair to either clutter faraday.clj with that code, or to depend on faraday.clj from an additional module. I thought it may continue to be a generally helpful organisation, but these changes are absolutely not necessary, and a bad idea to whatever extent they make the diff harder to verify.
  3. The requests.clj tests are there because it seemed fair to have more tests, now that more stuff is public. The tests I added in main.clj were to help me refactor - I wrote/ran them all against an unchanged Faraday, and didn't encounter any issues.

I can appreciate that the diff is not making things as clear as possible. If you're satisfied with the above, I can aggressively limit the changes to what is strictly necessary (only the lifting of some function bodies - without moving them, the making-public of a few auxiliary functions, and the test stuff, if you want to keep it).

ptaoussanis commented 9 years ago

Appreciate the additional info, thanks.

So do I understand correctly that the proposed changes (even in their reduced form) actually offer no benefit to users of Faraday, but would be made solely for consumption by your client?

I'd be okay with that as long as:

  1. Anything we lift from private to public gets an "Implementation detail." docstring.
  2. You're okay to accept these things as subject to change.

My approach to this is to make a separate Clojure project which depends only on the request generation from Faraday, and distance it, name wise, to whatever degree you're comfortable with (e.g. completely unrelated project which happens to expose an API which users may be comfortable with).

That sounds reasonable to me. I'd be happy to link to your client from the README.

If you're satisfied with the above, I can aggressively limit the changes to what is strictly necessary (only the lifting of some function bodies - without moving them, the making-public of a few auxiliary functions, and the test stuff, if you want to keep it).

Sure, I'd be happy to look over a PR in that form. You're welcome to include the tests if you're okay that I may not keep any/all of them (would like to look them over in detail first).

Tangentially- I'm curious whether you considered HTTP Kit as an async client mechanism (http://http-kit.org/client.html), and how your numbers currently look on nio vs thread pool performance?

Cheers!

moea commented 9 years ago

Peter,

Thanks, that sounds reasonable.

I've used HTTP Kit & similar, though for this I used Apache's async http-components library primarily for the reason that it enables the re-use of much of the logic in the AWS client for turning API-level requests into HTTP requests, so conserving the req/response types for the API was pretty easy. Initially, I wasn't entirely confident in my ability to discriminate between legitimate special-cases and historical baggage in the AWS client library (since much of the more general code is used by all of the API clients - S3, EC2, etc.), so wanted to share as much code as is possible. I haven't yet finished it, but have also written half or so of an NIO-based S3 client using the same approach, and sharing much of the code.

Also, despite not having any intention to consume any of this stuff from a language other than Clojure, it feels slightly irresponsible to not implement as much of this kind of thing in Java as is possible.

I'm very wary of talking about numbers, because my benchmarks have been pretty casual, but with even a small number of concurrent requests having large payloads (e.g. batch gets/writes of stuffed rows), the numbers are pretty compelling. I'll put some more work into quantifying this.

Take care, Moe

ptaoussanis commented 9 years ago

Closing the old request.