ietf-wg-httpapi / idempotency

Repository for "The Idempotency-Key HTTP Header Field"
Other
17 stars 10 forks source link

Make explicit the goal of being able to offer idempotency as a standalone intermediary service? #43

Open jmileham opened 6 months ago

jmileham commented 6 months ago

In discussion about the differences from OASIS Repeatability, @darrelmiller suggested that one significant difference was that the Idempotency spec can hope to be a sort of progressive enhancement, and be provided by generic connector software. The Repeatability spec is ambiguous on whether it could/should be provided by an intermediary service, so there may be an opportunity to make that differentiation explicit in the Idempotency spec.

The choice is not without tradeoffs, and if this spec were to explicitly support a standalone service implementation option, I would advocate that it not specify that an implementation of the spec not be deeply/transactionally integrated with the resource service. There are several potential benefits of a deeply integrated idempotency service, and clients need not be aware of which strategy is being used in order to benefit, so it would seem like a loss not to support both via this spec.

Here are some of the tradeoffs:

asbjornu commented 6 months ago

I've never even considered an independent "idempotency service". To be honest, I don't see the value in one. Without application-level domain knowledge of the resource state, and what may be considered a final response, afaict, failure states are irreconcilable.

Even if we created a set of new generic status codes to represent final states, transient failures would be unaccounted for and force the client into an ambiguous, unresolvable state. Unless it is mandated that all transient states (failure or not) are not cached, and only the specified final states are.

jmileham commented 6 months ago

The spec as drafted, to my understanding, draws a lot from the Stripe idempotency implementation which behaves as you describe - it is a standalone wrapper service that caches all outcomes, ambiguous or final. One goal for this spec would be to codify best practices and risks to mitigate for implementing such a pattern. In the other issue, I proposed, even for a standalone intermediary service, that it would significantly increase the amount of self-healing the client is able to effect without custom logic if the spec were to exclude 4xx from the set of cached responses if we can agree that they will be side-effect-free. (I'm just raising that here for clarification, and if there's more discussion to do on that point, I think having that in the other thread would keep the conversation clearer.)

Anyway, you could also imagine a world where a user or business provides an idempotency service wrapper conforming with this protocol at its edge. If the end user is on a wireless connection with high risk of network partition, positioning an idempotency service on a highly reliable network close to the user could offer significant safety benefits.

jmileham commented 6 months ago

Realizing I misread your last sentence:

Unless it is mandated that all transient states (failure or not) are not cached, and only the specified final states are.

I thought you were actually supporting caching all states failure or not. While I'm personally a fan of not caching as many non-final non-ambiguous results as possible (and again, I believe deeper discussion of which result states should and should not be cached belongs on #41), I would not support automatic retry by the idempotency service of requests that resulted in ambiguous outcomes (e.g. network partition to resource server, 504 response from resource server), and I don't believe that POV is supportable given the goals of this kind of system (at-most-once execution).

To zoom out, ambiguous states from the perspective of the idempotency service are not inherently unresolveable. They will just require domain-aware resolution by the client. To borrow from Stonebraker, "Eventual consistency means 'creates garbage'." There's really no way around managing the garbage if you want to provide a guarantee of at-most-once execution in the context of a distributed system (in this case an idempotency service and resource service that are not co-transactional). The client will need to check for success via another channel, and then decide whether to cut a new idempotency key and try again, or not.

jmileham commented 6 months ago

Jumping back to the "why not both?" question about specifying this as both a standalone intermediary and transactional service, it also occurs to me that you could find multiple idempotency-spec-implementing services in the same request chain if so. And if so, with a stack of compliant idempotency services in the signal chain, the optimal protocol would be for the outer layers of the onion to no-op themselves and rely on the underlying idempotency service to provide the guarantee (in the hope that the innermost is transactional, and also reducing the number of potential points of stateful failure in the chain). Which would involve signaling that an idempotency service is active in the underlying idempotency service's response. Repeatability spec has a response header for that, actually, which is interesting.

asbjornu commented 6 months ago

Perhaps a generic idempotency service is useful, I don't know. And I'm not yet convinced. I've tried to outline the reasoning behind my skepticism in https://github.com/ietf-wg-httpapi/idempotency/issues/41#issuecomment-2022651214.

jmileham commented 6 months ago

Replied to your comment on the other thread. I agree that a non-transactional intermediary idempotency service makes some major compromises, but I don't think it is up to us to be convinced about whether it is useful - the strategy is in use at Stripe, and it offers sound at-most-once execution semantics for a naive client. It doesn't guarantee execution, and therein lies the rub.

jmileham commented 5 months ago

User stories:

asbjornu commented 5 months ago

Thank you for writing up those user stories, @jmileham. I think approaching this from the "different layers of protection" perspective is very valuable, and something we should write into the specification itself. I think it would harm the adoption of Idempotency-Key if it was (wrongly) established that the only protection it could give is the weak kind offered by Stripe's implementation.

As an end user of a financial services app…

For this second user story, I think "exactly once delivery" could be mentioned as the pattern it attempts to achieve.

As a user on the web of the future equipped with a web browser implementing the idempotency spec…

Neat idea. I can envision an attribute on the <form> element whose value is automatically stuffed into the Idempotency-Key header upon form submission, as well as explicit support on the fetch() API.

richsalz commented 5 months ago

I think "exactly once delivery" could be mentioned as the pattern it attempts to achieve.

In my non-Chair opinion, this is probably going too far for a simple HTTP header field. We cannot achieve 2PC here.

jmileham commented 5 months ago

For this second user story, I think "exactly once delivery" could be mentioned as the pattern it attempts to achieve.

Yes, agreed! Of course the client must have durable retry capability (or else no guarantee of delivery), and it must get the submission done and receive success response before the server expires its idemptency-key cache entry (or else double delivery). I discuss here how adding a timestamp to the protocol, as OASIS Repeatability does, could eliminate the safety bug.

Neat idea. I can envision an attribute on the

element whose value is automatically stuffed into the Idempotency-Key header upon form submission, as well as explicit support on the fetch() API.

Thanks! I believe (though I haven't gotten that far down the path of thinking it through) that the semantics of HTML forms are sufficiently strong that the browser could manage the idempotency-key header automatically, even (but I'm getting head of this draft's purpose).

jmileham commented 5 months ago

In my non-Chair opinion, this is probably going too far for a simple HTTP header field. We cannot achieve 2PC here.

Yeah, that's very fair as well. I sometimes talk about "exactly once semantic execution" as a target eventual outcome for the limited case of valid requests and well-behaved clients. But network partitions can be forever, so there's really no such thing as exactly once delivery in a distributed system.

jmileham commented 5 months ago

If it helps illustrate what I mean by a well-behaved client, here are the guarantees of Betterment's durable retryable job queue that (barring any critical bugs) would enable happy-path eventual exactly once execution of a mutative request to a 3rd party service in combination with Idempotency-key: https://github.com/Betterment/delayed?tab=readme-ov-file#operational-considerations

asbjornu commented 5 months ago

@richsalz, while Idempotency-Key can't make any guarantees about exactly-once-processing of a request, it can describe behaviors in compliant clients and servers that make it possible to achieve in most circumstances – and I think that's highly desirable.

Even in the worst-case scenario where both the server and client is left in an ambigous, unresolved state, either party won't be any worse off than if the spec had not attempted to achieve at-most-once-processing of requests. Regardless of which protection level you are aiming for, the state would need to be reconciled out-of-band somehow.

As long as out-of-band reconciliation is an exception to the norm (and as I have implemented transactional, distributed idempotency systems myself, I know this to be the case), that's fine. Since exactly-once-processing of requests would require deep integration with the origin server and not just a simple "cache the first response I receive" kind of service discussed in #41, it would lead to fewer out-of-band reconciliations, since the client would be able to automatically retry in a lot more situations.