Add retrying functionality

Pimm commented 1 year ago

Note: this PR is currently branched off pimm/profiles-settlements. Please merge #293 before this.

Resolves #257.

Concept

Every requests to which the Mollie API responds with a 5×× status code will be re-attempted until:

the Mollie API responds with a different status code, or
the attempt limit has been reached, or
the request has timed out (as per the timeout set in the Axios instance).

The idea is that if the Mollie API has a brief hiccup, the extra attempts may cause the request to succeed anyway.

For POST and DELETE requests, an idempotency key is added (in the form of an Idempotency-Key header). This ensures the Mollie API can distinguish a single request being re-attempted from two separate similarly-looking requests. In effect, this allows this client to safely re-attempt requests.

We agreed on an attempt limit of 3 and a (fallback) retry delay of 2 seconds.

We've decided not to retry on network issues, such as no network being available or the request timing out. This client is designed to be run on servers. Unlike on mobile phones with unreliable wireless connections, we expect the network to either work or not work. We don't expect a retry to resolve such network issues.

Limitations

Because it was really easy to implement, we added support for the Retry-After header. I haven't seen the Mollie API actually utilise this header, but it can use it to instruct the client to wait longer or shorter than two seconds.

But… because we can't imagine it ever being useful, we left out support for the HTTP date form of Retry-After. If support is ever needed, there's a library out there to help out.

Note

If we do ever decide to retry on timeouts, we should introduce a per-attempt timeout separate from the global timeout which currently exists. If a developer specifies a timeout in their call to createMollieClient, retrying after that timeout would be unexpected.

mollierobbert commented 1 year ago

We don't currently implement Retry-After indeed. I haven't heard anyone ask for this before, but it's an interesting idea.

mollierobbert commented 1 year ago

One of our test merchants is asking whether we can tag this commit so they can already try it out, e.g. if we tag it as prerelease they can install it via @mollie/api-client@prerelease. Is this possible @Pimm?

mollierobbert commented 1 year ago

@Pimm I'm getting the feedback from our test merchant that they would like to configure their own idempotency key as well. Can you make this happen?

This way, they can tie the key to their own back-end. For example if the whole container dies during the request, there is no way of retrying the request if the library only generates its own keys. They would rather generate their own key and provide this to the library explicitly, so they can protect themselves against scenarios that are outside the control of the Mollie library.

Pimm commented 1 year ago

I just had a call with @mollierobbert and @fjbender. Thanks for making some time for me on this blackest of Fridays.

The idea @janpaepke and I had which motivated this PR, is that if the Mollie API has a brief hiccup, extra attempts may cause a request to succeed anyway. The feature is designed to be invisible to users of this library. It all happens internally.

The feedback we received, is that integrators would like to have control over the idempotency keys. The current implementation helps in the scenario in which the Mollie API hiccups. However, there are other scenarios in which idempotency keys could help. For instance, if the plug is pulled right as a server does mollieClient.payments.create(…), the server could theoretically ‒ once it boots back up ‒ notice the payment creation is still in a queue and then retry with the same key. But for this to be an option, users of this library have to be able to provide their own keys (since there's no way for the server to know which random key was generated last time).

The PHP client also uses idempotency keys since recently, and gives control over the keys.

The PR in the PHP client adds a few methods, most notably:

$mollieApiClient->setIdempotencyKey(…)

and

$mollieApiClient->setIdempotencyKeyGenerator(…)

setIdempotencyKey sets the key which will be used during the next request (whatever POST, DELETE, or PATCH(?) request that may be). This would not be a good interface for the Node.js client. In JavaScript, setting "global" state like this is frowned upon, not in the least because asynchronous code is common and thus a race condition may cause the key you set with setIdempotencyKey to be snatched by another request before you do the mollieClient.payments.create(…).

Idea 1

We could add an optional idempotencyKey property in the endpoints which result in a POST or DELETE call. One could then do:

mollieClient.payments.create({
  …,  
  idempotencyKey: 'bf4a894b'
})

This is conceptually similar to $mollieApiClient->setIdempotencyKey in the PHP client, but safe.

Idea 2

We also could allow passing an idempotency key generator when creating the client, like so:

const mollieClient = createMollieClient({
  apiKey: 'test_dHar4XY7LxsDOtmnkVtjNVWXLSlXsM',
  idempotencyKeyGenerator: (type, data) => {
    return `${type}-${data.metadata.orderID}`
  }
});

This is conceptually similar to $mollieApiClient->setIdempotencyKeyGenerator in the PHP client. I do feel this function would have to be provided the data of the thing being created or deleted, as the idempotency key will likely be based on said data. If the function is not provided anything, all it could possibly do is generate a random idempotency key, which is the default behaviour anyway.

Idea 3

Finally, we could give the error thrown by the endpoints which result in a POST or DELETE call an idempotencyKey property. One could then do:

try {
  await mollieClient.payments.create(…);
} catch (error) {
  const { idempotencyKey } = error;
  // Schedule a retry 10 minutes from now with the same key.
}

Final thought

In the current implementation, we don't retry on network issues. If the network is down, we expect it to be still down five seconds from now. And we don't want to continue retrying for more than a few seconds, because a customer may be gazing at a spinner while the payment for their order is being created. However, there are processes which happen "in the background" by nature, such as performing a recurring payment. We could consider making more attempts and also retrying on network issues in those cases.

janpaepke commented 1 year ago

I think all these ideas are sensible and since they're not mutually exclusive I would suggest to implement them all.

Retrying after certain timeouts seems like a good idea but would be outside the scope of this PR.

Just to clarify: we would still keep the "invisible" feature of supplying our own idempotency key, right? We'd just allow for it to be overwritten either by supplying one or providing a generation callback.

This also means if a consumer both sets up custom key generation and provides a key with the call the latter "wins".

This does not seem counterintuitive but the hierarchy needs to be documented.

Pimm commented 1 year ago

The more I think about idea 2 ‒ the idempotencyKeyGenerator ‒ the less I like it.

Here's why:

It only adds any value if the engineer wants to derive the idempotency key from the (request) data. Using (a hash of) the internal order ID as the key is something I can envision. But I think this is an uncommon case. I think it's more common that the key is separate from the (request) data, and in that case the generator doesn't help.
If the engineer wants to derive some of the idempotency keys from the (request) data, they might still want to use random ones for other endpoints (the default behaviour). That demands some kind of escape hatch to fall back to the default behaviour, thus making it more complicated.

If we don't add this functionality, the engineer could work around it:

function createPayment(data) {
return mollieClient.payments.create({
...data,
idempotencyKey: hash(data.metadata.orderID)
});
}

janpaepke commented 1 year ago

After discussing this more I tend to agree with this assessment.

It's not that we can't think of any theoretical cases where defining a key generation function might be useful. It's that these cases are very rare and there are valid work-arounds available (see @Pimm's example).

Usually I would say "well if it doesn't hurt, let's just add it". However with this SDK we also have to consider the long term effects of exposing new API. This would have to be supported potentially indefinitely, because one engineer is using it, who could have achieved the same with slightly different code, had he been forced to.

Additionally not adding the generator function makes for a simpler API and less confusion regarding my point above.

With those points in mind, I would suggest to leave out idea 2 and only implement 1 and 3.

mollierobbert commented 1 year ago

So if I understand correctly, you want to move forward with:

Auto-generating the idempotency key if none was provided, and automatically use it in the timeout scenario.
Allowing the user to manually provide a custom key with each API call.
Exposing the key back to the user in case of an error, so the user can trigger custom retry logic if the auto-retry fails.

All of these make sense to me. 👍

janpaepke commented 1 year ago

After finishing Idea 3, I noticed that the ApiError wasn't exposed from the lib.
Additionally there seems to have been an outdated interface, which was never (externally) exposed, either. Exporting the error allows for type-safe checking like this:

try {...} catch (error) {
    if (error instanceof MollieApiError) {
        console.log(error.statusCode /* <- is a known property */);
    }
}

Consequently I included exposure of the ApiError as MollieApiError in the PR: https://github.com/mollie/mollie-api-node/pull/298/commits/aa8f4924b013a67be6bc8a0a6bff6a7f27a6a35c

How do you guys feel about that?

Pimm commented 1 year ago

I noticed that the ApiError wasn't exposed from the lib.

Someone else actually noticed that just a tad before you did. But that's good! It just confirms we're on the right track.

Additionally there seems to have been an outdated interface, which was never (externally) exposed, either.

Good catch!

mollie / mollie-api-node