Open bflora2 opened 5 months ago
The problem with these sorts of errors is you don't know whether they're safe to retry or not (or at least, aws4fetch wouldn't know generally). Eg, you might accidentally send duplicate messages over SNS or SQS, etc, etc.
It's possible I could introduce an opt-in option for it though. Aside from this error, are there any others you've observed?
I've been using aws4fetch for at least a few years and I don't recall ever seeing it throw an exception for anything other than network errors. The MDN documentation page for fetch
lists the reasons an exception might thrown, and network errors appears to be the only scenario listed that would warrant a retry.
In my case, I usually only see occasional instances of a single network error log appear even though the Cloudflare worker is normally being invoked multiple times per second (and always calls to AWS). So, retrying these AWS requests resulting in network errors would almost certainly be successful.
I suspect the majority of AWS calls are reads/writes to resources where a duplicate read/write would not cause an issue, and therefore, a retry on network error would be very desirable. In the scenarios you mentioned where a duplicate AWS call would be a problem, I think most devs would want to retry and deal with potential duplication issues on the AWS side. Otherwise, the dev has to choose between either not retrying or else attempting to communicate further with AWS to determine whether a retry is safe, neither of which seems like an optimal solution. In my case, I ignored the network errors, but only because the worker is performing non-critical work where a few dropped AWS calls per month is an insignificant issue. However, I will soon begin using aws4fetch for new high traffic Cloudflare workers whose work will be more critical, which has prompted me to finally look into the network errors.
If aws4fetch was new, and not yet released, I think retrying on network errors by default, while also providing an opt-out option, would be the most desirable choice. However, I can understand that you might be concerned about potential unintended consequences of making such a change given that aws4fetch is already in use. So, as you suggested, perhaps a new opt-in option is best. I'm sure whatever solution you might consider would be appreciated by us aws4fetch fans. :-) Thank you for listening.
@bflora2 What I recommend doing, is using .sign
from aws4fetch, and then implementing whatever retry logic you want "normally".
I use aws4fetch.esm.js (v1.0.18) in a Cloudflare worker to access DynamoDB. Occasionally I see the following exception thrown from the
AWSClient.fetch
method:The problem is that exceptions thrown by
await fetched
inAWSClient.fetch
short-circuit the retry loop. Short-circuiting the retry loop is fine for exceptions caused by a bad request, but it's not good for exceptions thrown due to transient network errors. Perhaps network exceptions could be caught inAWSClient.fetch
so the retry loop can continue in those cases?