taoensso / faraday

Amazon DynamoDB client for Clojure
https://www.taoensso.com/faraday
Eclipse Public License 1.0
238 stars 84 forks source link

Porting to AWS SDK v2 #146

Open barkanido opened 4 years ago

barkanido commented 4 years ago

Hey, @ptaoussanis. Thanks for a great dynamo client!

We are thinking about using dynamo in the foreseeable future and I have found this client as a good middle ground between amazonica and aws-api. However, AWS java API version was released quite some time ago, and although it is still missing a few DynamoDB APIs (36 and 34), it is stable. It also brings a few important features like non-blocking IO and auto pagination.

Any plans for porting this client to it? Any plans to accept MR in this direction? How hard you think a project like this should be in your opinion?

belucid commented 4 years ago

Just FYI @barkanido , stewardship of Faraday has largely been passed on at this point. I suspect @ptaoussanis is happy to weigh in on important matters here and there, but isn't going to take on a big project like an API port.

@kipz and @joelittlejohn have done a lot of the great maintenance work lately for example.

It'd probably be helpful to build up a case of everything Faraday is missing out on by not porting so we can make an informed decision as a community. Auto-pagination for example sounds interesting, but non-blocking IO can be achieved fairly simply by the Faraday user (or by Faraday) on the Clojure side. It's unclear that we need that from the AWS API (maybe you can set me straight on that).

barkanido commented 4 years ago

Good to know that there is a solid community behind Faraday. It is just that I tend to believe that although the major version 1 of AWS API is still maintained. Version 2 wasn't released just for fun. AWS will deprecate v1 at some point and move all new development to v2. So if the community would like to stay in sync with the underlying AWS API they would have to take this decision.

It is not an easy one of course. For example, expose the blocking and non-blocking API side by side requires substantial work.

As for Non blocking IO specifically, to us it is very important. IMHO, doing it better then (or just as) netty already does in the new version is just reinventing the wheel.

I also think that for some users, non blocking IO is essential, especially for applications that are IO bounded and need to parallelise most of their work in order to achieve acceptable throughput (web servers that are connected to dynamodb are good example).

My first thought was to try and do it for the community. However, I totally understand if this kind a path is not something you are willing to take currently.

What do you think?

kipz commented 4 years ago

I tend to agree with @belucid here, but would prefer that auto-pagination also be left to the user (or some companion library?).

How best to deal with threading & memory management of large queries/scans can be very application specific.

At my work, we have built quite a few features on top of Faraday to manage pagination, batching, table configuration convergence amongst other things, and am grateful that it's easy to implement on top of current Faraday API. We're not against contributing them to an appropriate project should there be appetite for such a thing, and maybe other folks feel the same about they're usage.

The more Faraday has opinions on such things, the larger a surface area we have to support as a community, and IMHO, these opinions can sometimes restrict the library's flexibility.

Having said all that, as @belucid says, it probably makes sense to start building a case for looking at v2, and to see if there's enough momentum in the community to take it on.

joelittlejohn commented 4 years ago

I think the killer app for 2.0 is the async support. I do agree that 'non-blocking IO can be achieved fairly simply by the Faraday user (or by Faraday) on the Clojure side' however it's also the nature of non-blocking solutions that the further you can be non-blocking all the way down the better. Right now, when you wrap blocking libraries for non-blocking use, you inevitably need to reserve more resources to do so. Project Loom will solve this.

I'm not sure there has been mass adoption of the AWS SDK 2.x though, particularly in the Clojure community. Making this move too early can cause a lot of headaches for people building apps that need to interact with a few different AWS services (Dynamo plus some others). The fact that 2.x exists and was not created for fun isn't really a compelling reason to migrate. I don't think the features that are missing from 2.x are a problem for us, it's more a question of what will cause least disruption for everyone and do we have enough users that need this? There's also a movement in the community away from the Java SDKs towards e.g. cognitect/aws-api so we might decide to just go that way instead.

Overall I'm not inclined to go ahead and start migrating. I'd like to see this issue get more comments and support from users before doing it.

barkanido commented 4 years ago

Thanks @joelittlejohn . I do agree that this is something that needs a discussion in a wider forum of users.

joelittlejohn commented 3 years ago

Closing this one as I don't think it is a goal. I think it's more likely we would migrate to cognitect/aws-api in future.

pesterhazy commented 10 months ago

FWIW, I'm running into nasty exceptions like this

Unable to unmarshall exception response with the unmarshallers provided

which are related to using AWS SDKv1 together with Java >=17

See https://repost.aws/articles/ARPPEPfTPLTlGLHIsVVyyyYQ/troubleshooting-unable-to-unmarshall-exception-response-with-the-unmarshallers-provided-in-java

joelittlejohn commented 10 months ago

Thanks @pesterhazy. Looks like we finally have a compelling reason to upgrade.

LouiseKlodt commented 9 months ago

FYI I'm also running into nasty exceptions like this when using batch-write-item:

Execution error (AmazonDynamoDBException) at com.amazonaws.http.AmazonHttpClient$RequestExecutor/handleErrorResponse (AmazonHttpClient.java:1879).
Provided list of item keys contains duplicates (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 0FNP3B7UDUESMP6MBC3U3676CJVV4KQNSO5AEMVJF66Q9ASUAAJG; Proxy: null)

even though there aren't duplicate keys. We're using Java11.

When using Cognitects AWS API it batch writes the records with no issues.

joelittlejohn commented 9 months ago

This is a strange one @LouiseKlodt. I'm not aware of anything that would cause you to see duplicate keys that relates to using the AWS SDK v1.x on Java 11.

Are you able to reduce log the data that was being written when this exception was thrown and inspect it? (and maybe post here if it is not sensitive).

LouiseKlodt commented 9 months ago

Hi @joelittlejohn, sorry for the slow response. I tried reproducing the issue a few days ago, but it's working OK now, so either something has resolved itself, or I repeatedly made some mistake. So please disregard my comment above. Sorry for the noise! Thanks!

felixdo commented 3 months ago

And in the long term: https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/

kevin-ewing commented 2 months ago

The AWS SDK for Java v1.x will enter maintenance mode on July 31, 2024, and reach end-of-support on December 31, 2025.

The following outlines the level of support for each phase of the SDK lifecycle.

Additional resources for upgrading to AWS SDK for Java v2.x are below:

Developer Guide – AWS SDK for Java v2.x. – Getting started guide for the AWS SDK for Java v2

Migration Guide – Explains changes between the two versions and provides instructions on migrating

joelittlejohn commented 1 month ago

This is a major change to Faraday and I think the best way to tackle this is:

Why the change of namespace? Because firstly, the Clojure way is to use a new name when things are fundamentally incompatible, and secondly, in the rare case that faraday 1.x and 2.x need to live in the same VM, they can coexist.

One area of breaking change will be client-opts that use Amazon SDK types, including client, provider, creds.

I don't expect to maintain both 1.x and 2.x going forward since the library is largely feature complete and the AWS SDK 1.x will soon be end-of-life'd. We would continue to develop and enhance only Faraday 2.x, making only critical patches to 1.x if needed.

Comments on this plan are very welcome. The change itself is quite involved, with new code around client-opts to be written and changes needed across most of functions in the large taoensso.faraday namespace.

kipz commented 1 month ago

Makes sense to me @joelittlejohn FWIW!

ptaoussanis commented 1 month ago

@joelittlejohn Sounds good to me Joe! 👍