Error handling on fetch

bflora2 commented 5 months ago

I use aws4fetch.esm.js (v1.0.18) in a Cloudflare worker to access DynamoDB. Occasionally I see the following exception thrown from the AWSClient.fetch method:

    Error: Network connection lost.
    at async AwsClient.fetch (aws4fetch.esm.js:69:19)

The problem is that exceptions thrown by await fetched in AWSClient.fetch short-circuit the retry loop. Short-circuiting the retry loop is fine for exceptions caused by a bad request, but it's not good for exceptions thrown due to transient network errors. Perhaps network exceptions could be caught in AWSClient.fetch so the retry loop can continue in those cases?

mhart commented 5 months ago

The problem with these sorts of errors is you don't know whether they're safe to retry or not (or at least, aws4fetch wouldn't know generally). Eg, you might accidentally send duplicate messages over SNS or SQS, etc, etc.

It's possible I could introduce an opt-in option for it though. Aside from this error, are there any others you've observed?

bflora2 commented 5 months ago

I've been using aws4fetch for at least a few years and I don't recall ever seeing it throw an exception for anything other than network errors. The MDN documentation page for fetch lists the reasons an exception might thrown, and network errors appears to be the only scenario listed that would warrant a retry.

In my case, I usually only see occasional instances of a single network error log appear even though the Cloudflare worker is normally being invoked multiple times per second (and always calls to AWS). So, retrying these AWS requests resulting in network errors would almost certainly be successful.

I suspect the majority of AWS calls are reads/writes to resources where a duplicate read/write would not cause an issue, and therefore, a retry on network error would be very desirable. In the scenarios you mentioned where a duplicate AWS call would be a problem, I think most devs would want to retry and deal with potential duplication issues on the AWS side. Otherwise, the dev has to choose between either not retrying or else attempting to communicate further with AWS to determine whether a retry is safe, neither of which seems like an optimal solution. In my case, I ignored the network errors, but only because the worker is performing non-critical work where a few dropped AWS calls per month is an insignificant issue. However, I will soon begin using aws4fetch for new high traffic Cloudflare workers whose work will be more critical, which has prompted me to finally look into the network errors.

If aws4fetch was new, and not yet released, I think retrying on network errors by default, while also providing an opt-out option, would be the most desirable choice. However, I can understand that you might be concerned about potential unintended consequences of making such a change given that aws4fetch is already in use. So, as you suggested, perhaps a new opt-in option is best. I'm sure whatever solution you might consider would be appreciated by us aws4fetch fans. :-) Thank you for listening.

Manouchehri commented 4 months ago

@bflora2 What I recommend doing, is using .sign from aws4fetch, and then implementing whatever retry logic you want "normally".

mhart / aws4fetch

Error handling on fetch #69