Idempotency draft feedback

(I'm writing this on my own, and the views here aren't necessarily that of Google, nor the team as a whole. Also, I attempted to email this to the working group, but got some bounce notices, so I'm just posting it here as well).

I work on the engineering team that defines and implements Google Standard Payments.

Document feedback:

2.2: I really appreciate you calling out that idempotency-key + different-payload being bad, this is definitely something we have dealt with as well.

2.4 and 2.6: In 2.4 you state:

idempotency fingerprint MAY be used in conjunction with an idempotency key to determine the uniqueness of a request.

But then in 2.6, you state "idempotency key or fingerprint" for both initial and duplicate requests. These statements seem to conflict with one another? Or maybe it's just ambiguous wording? My reading of section 2.6 says that a request could be rejected entirely based on the fingerprint, regardless of the idempotency-key.

2.6:

The resource server MUST respond with the result of the previously completed operation, success or an error.

Assuming "error" here means a 4xx or 5xx status code, this line is probably the largest difference between how Google Standard Payments (GSP) was designed and this spec. In GSP, we effectively said that only HTTP 200 responses were allowed to be cached, and all other cases should be considered retriable until a HTTP 200 is returned.

If the resource server is having problems and starts returning 502 (Bad Gateway) response codes, should that be cached and replayed by the resource server? In those cases, the resource server may not even be able to look into it's cache to respond properly.

If the client loses connection before the server responds, the client would then need to retry the request to find the result. If it was some temporary error (say, 429 - Too Many Requests), the client would need to try again with a new idempotency-key, increasing the traffic to the server. (One retry with the original idempotency-key to get the cached result, then a second request to retry with a new idempotency-key).

This also seems like it could end up leading to a larger resource-server cache of idempotency-keys if clients have to keep generating new idempotency-keys for cases that could be retried in the same state.

My core issues here are: (1) I'm not sure this can be implemented as a "MUST" in the real world as defined today. (2) I'm not sure it's the correct design decision.

4.1: I saw that you linked to our capture implementation that talks about idempotency ( https://developers.google.com/standard-payments/payment-processor-service-api/v1/TopLevel/capture ). We also have a more detailed reasoning about how/why we do idempotency that may interest you: https://developers.google.com/standard-payments/guides/connectivity/protocol-standards#request_idempotency.

5: With clients generating the keys, it could lead to leaked data. I do like you recommending UUID (122 bits of entropy) should make it harder to attack. But if an implementation uses something with less entropy, an attacker could start trying to find existing cache entries belonging to another client and hope that they get played back (which would lead to a data leak).

Overall: I really appreciate this spec being written. Idempotency is a tough topic that we've seen companies either not implement at all, or do it poorly. Trying to set a standard here for people to follow is laudable.

Background: This isn't feedback for the document, but more philosophy differences I wanted to state so you understand where I'm coming from.

In the payments space, we tend to be fairly paranoid about the payload of our data. In GSP, we pushed for all payloads to be PGP or JWE encrypted and signed. This allows for using a HTTP load-balancer that can terminate the HTTPS connection for us, then the application server can deal with the payload (preventing accidental data leakage at the loadbalancer).

We tend to think protocol agnostic as we use both GRPC and HTTP regularly at Google, so using HTTP Headers to drive behavior that we feel is core to a request is tying us to that specific protocol. With that, separating the idempotency-key from the rest of the payload doesn't make sense to us. We also made our request_id to be dual-purpose of being a unique identifier (lookup ID) for an API request plus the idempotency key (though we landed there after having them separate).

Thanks, -James Wendel

@jwendel Thanks for the comments. Great points.

Regarding 2.4, 2.6 , Are you saying that in 2.2., anyone can read the idempotency contract as "idempotency key + Fingerprint" so 2.6 should be using the word "and", instead of "or" so the sentence now becomes "idempotency key and fingerprint".

If this is what you meant then yes we can clear the confusion in the wordings.

Regarding 2.6, on the server returning the previous processing status of the request, regardless of the outcome (2XX or 4XX/5XX), your call out is a fair point. That said, if you think of implementing the request idempotency as an edge level gateway function, you can return 502 even if the resource server is down with 502. The implementation on the provider side with an edge level service is simpler if we return the same status.

But i agree that usage of "SHOULD" or "MAY" is better here than "MUST". We can change it.

With respect to 5; Regardless how the client generates the idempotency key, from security standpoint, you would make sure on the server side you generate a key to make it unique (as clients can always send key with less entropy so there are chances of collision and you make sure that the key is not guessable). For example, you can use something like say the client sent idempotency key + OAuth2 client Id/account_number (or some other unique caller identifier known only to the server) + capability URI , as the key on the server side.

This is worth clarifying in the security section of the document

Sorry for the slow response, and thank you very much for the prompt response.

For 2.6: I can just say from experience that I have seen various companies implement idempotency in many different places in the stack. Some put it at the edge (almost near their load balances), while others do it deeper in application code. This just leads to different behavior about the corner-cases of returning a response idempotently. The point I was trying to make that is if you expect errors (especially 5xx errors) to be replayed idempotently, that seems to be forcing caching/replay of those to happen at some sort of lookup system outside of application/resource code (if a loadbalancer returns a 502 and the caching was implemented in application code, the application has no way to know to replay a 502). I worry about specifying that 5xx errors must be idempotent limit implementations that could be used here.

The advantage of only being idempotent for only 2xx (and maybe 4xx errors) is that it opens it up for edge or application/resource code to implement the caching.

Re 5 (security): 100% agree that some secondary key is needed to go along with idempotency key. OAuth2 (or something derived from it) can definitely be a good way to scope these. I think I'm just asking to have section 5 of the draft include something about scoping of request caches beyond the client provided key.

@jwendel Sorry for the long delay. For 2.6, let me see how we can make it clear in the spec so that the temporary failure error codes (429, 500,502,503,504) are not scoped in the idempotency processing status. Wrt security, yes we can add recommendations to implement a server-side idempotency key.

This is resolved in the previous merge (added the text For other 4xx/5xx errors, such as 401, 403, 500, 502, 503, 504, 429, or any other HTTP error code that is not listed here, the client SHOULD act appropriately by following the resource server's documentation. and added the security section as well to describe the construction of server side idempotency key). Resolving the issue.

ietf-wg-httpapi / idempotency

Idempotency draft feedback #3