ietf-wg-httpapi / idempotency

Repository for "The Idempotency-Key HTTP Header Field"
Other
15 stars 9 forks source link

Response codes and potential app-layer semantic collisions #42

Open jmileham opened 3 months ago

jmileham commented 3 months ago

This is related to prior threads #2, #11, but informed by our experience at Betterment. Admittedly these might open up some cans of worms, so understood if any of this is deemed a bridge too far.

TL;DR, our experience leads us to:

  1. Suggest status code 409 (or 400, in alignment with OASIS Repeatability spec) for fingerprint mismatch, and
  2. See if there's room to to reopen the discussion of whether to additionally/alternatively recommend blocking as a legitimate/preferable alternative to returning a client error on concurrent requests - blocking yields significant client benefits.
  3. In the case that an implementer chooses to return an error on concurrent requests instead of blocking, suggest using a different status code (e.g. 423 Locked) for concurrent request errors, as @wolfgang42 originally suggested in #11.

Fingerprint mismatch response code

Starting from the outside in, developer experience first, the spec calls for status code 422 in the case of a fingerprint mismatch. This feels likely to collide with common use in industry of 422 for business domain layer validation errors. There will likely already be a protocol between the client and server for describing/parsing validation errors via the response body when the server returns 422 for a mutative endpoint. Adding a new use for 422 in this context is likely to require implementers to discriminate between responses representing resource server business domain layer validation errors and idempotency server fingerprint mismatch errors.

In addition, in my view, a request payload not matching the fingerprint is not a user recoverable condition, and not a great fit for 422. The fingerprint mismatch response will not express what is different about the request from the expected request (and we wouldn't want it to, for security reasons). In my view, 409 would be a very appropriate response here - the request that you'd like to make is in conflict with a request that was already made, and that prior request is the reason you can't make this one. For comparison, though, the OASIS Repeatability spec specifies a 400 in this case, parallel to all other protocol errors including not providing appropriate headers. This may be sufficient, but to my mind, there is value in treating a fingerprint mismatch as an expected but non-recoverable operational error, discrete from client bugs.

Concurrent request response code

It was already discussed some in #11, but I would make the case that having the server block rather than return a client error upon concurrent requests for the same idempotency key significantly simplifies client implementation, reduces the need to retry, and improves customer experience in interactive use cases. So I would really value the spec specifying that blocking is a reasonable alternative to returning a client error response code. But I would also make the case that 409 is not a great fit for the response code to return. Notwithstanding my preference to use 409 to represent a fingerprint mismatch above, 409 is usually a permanent, non client recoverable condition blocking application of a mutative change. OP in #11 asked why not 423 for this case, which I would echo. Alternatively I wouldn't mind specifying that blocking is recommended and leaving implementers to their own choice of response code if they prefer to deviate from the spec.