Make explicit the goal of being able to offer idempotency as a standalone intermediary service?

jmileham commented 6 months ago

In discussion about the differences from OASIS Repeatability, @darrelmiller suggested that one significant difference was that the Idempotency spec can hope to be a sort of progressive enhancement, and be provided by generic connector software. The Repeatability spec is ambiguous on whether it could/should be provided by an intermediary service, so there may be an opportunity to make that differentiation explicit in the Idempotency spec.

The choice is not without tradeoffs, and if this spec were to explicitly support a standalone service implementation option, I would advocate that it not specify that an implementation of the spec not be deeply/transactionally integrated with the resource service. There are several potential benefits of a deeply integrated idempotency service, and clients need not be aware of which strategy is being used in order to benefit, so it would seem like a loss not to support both via this spec.

Here are some of the tradeoffs:

With an intermediary service, in the event of a network partition between the idempotency service and the resource service, or in the event of an ambiguous response code (typically 5xx), it is unclear whether a side effect happened, so the idempotency service must fail closed and cache the response (or cache its own first-party 504 response in the event of the aforementioned network partition). With an integrated transactional idempotency service, assuming the resource service is itself transactionally sound, recording the idempotency key can be co-transactional with the side effects leading to no need for cached unresolved responses. 100% of failing requests could then self-heal on retry if the underlying failure condition is resolved.
In a fully integrated implementation, the idempotency service can inherently live within the authenticated security perimeter, obviating cross-account online attacks conjectured in #40.
A fully-integrated implementation has the option of returning a fresh representation of the result resource rather than caching the response without introducing additional retrieval HTTP endpoints on the resource service. Returning a fresh copy of the response object has better properties for enabling distributed system consistency. Additionally, not storing a cached copy of the result in the idempotency service has security benefits by not having to create another copy of potentially sensitive result data, even if encrypted.

asbjornu commented 6 months ago

I've never even considered an independent "idempotency service". To be honest, I don't see the value in one. Without application-level domain knowledge of the resource state, and what may be considered a final response, afaict, failure states are irreconcilable.

Even if we created a set of new generic status codes to represent final states, transient failures would be unaccounted for and force the client into an ambiguous, unresolvable state. Unless it is mandated that all transient states (failure or not) are not cached, and only the specified final states are.

jmileham commented 6 months ago

The spec as drafted, to my understanding, draws a lot from the Stripe idempotency implementation which behaves as you describe - it is a standalone wrapper service that caches all outcomes, ambiguous or final. One goal for this spec would be to codify best practices and risks to mitigate for implementing such a pattern. In the other issue, I proposed, even for a standalone intermediary service, that it would significantly increase the amount of self-healing the client is able to effect without custom logic if the spec were to exclude 4xx from the set of cached responses if we can agree that they will be side-effect-free. (I'm just raising that here for clarification, and if there's more discussion to do on that point, I think having that in the other thread would keep the conversation clearer.)

Anyway, you could also imagine a world where a user or business provides an idempotency service wrapper conforming with this protocol at its edge. If the end user is on a wireless connection with high risk of network partition, positioning an idempotency service on a highly reliable network close to the user could offer significant safety benefits.

jmileham commented 6 months ago

Realizing I misread your last sentence:

Unless it is mandated that all transient states (failure or not) are not cached, and only the specified final states are.

I thought you were actually supporting caching all states failure or not. While I'm personally a fan of not caching as many non-final non-ambiguous results as possible (and again, I believe deeper discussion of which result states should and should not be cached belongs on #41), I would not support automatic retry by the idempotency service of requests that resulted in ambiguous outcomes (e.g. network partition to resource server, 504 response from resource server), and I don't believe that POV is supportable given the goals of this kind of system (at-most-once execution).

To zoom out, ambiguous states from the perspective of the idempotency service are not inherently unresolveable. They will just require domain-aware resolution by the client. To borrow from Stonebraker, "Eventual consistency means 'creates garbage'." There's really no way around managing the garbage if you want to provide a guarantee of at-most-once execution in the context of a distributed system (in this case an idempotency service and resource service that are not co-transactional). The client will need to check for success via another channel, and then decide whether to cut a new idempotency key and try again, or not.

jmileham commented 6 months ago

Jumping back to the "why not both?" question about specifying this as both a standalone intermediary and transactional service, it also occurs to me that you could find multiple idempotency-spec-implementing services in the same request chain if so. And if so, with a stack of compliant idempotency services in the signal chain, the optimal protocol would be for the outer layers of the onion to no-op themselves and rely on the underlying idempotency service to provide the guarantee (in the hope that the innermost is transactional, and also reducing the number of potential points of stateful failure in the chain). Which would involve signaling that an idempotency service is active in the underlying idempotency service's response. Repeatability spec has a response header for that, actually, which is interesting.

asbjornu commented 6 months ago

Perhaps a generic idempotency service is useful, I don't know. And I'm not yet convinced. I've tried to outline the reasoning behind my skepticism in https://github.com/ietf-wg-httpapi/idempotency/issues/41#issuecomment-2022651214.

jmileham commented 6 months ago

Replied to your comment on the other thread. I agree that a non-transactional intermediary idempotency service makes some major compromises, but I don't think it is up to us to be convinced about whether it is useful - the strategy is in use at Stripe, and it offers sound at-most-once execution semantics for a naive client. It doesn't guarantee execution, and therein lies the rub.

jmileham commented 5 months ago

User stories:

As a developer of enterprise product line spanning many differently implemented services that did not bake idempotency into their original API design, I want to stand up an intermediary idempotency service to provide a consistent at-most-once-execution guarantee across all the HTTP APIs my clients might call. I am willing to accept that requests that yield ambiguous responses will not be automatically reconcilable by clients. (I want a non-transactional intermediary service)
As an end user of a financial services app that values both correctness and customer experience highly, I would like to be protected against accidental double-submission of a mutative request I make through the app, even if I experience a network partition and have to resubmit. I would like to be able to recover and succeed in submitting my valid request and receiving the expected response with 100% certainty once the network partition is resolved simply by continuing to submit the form. (I want a co-transactional service embedded in the application)
As a user on the web of the future equipped with a web browser implementing the idempotency spec, I want safety against accidental double submission of my form for any mutative request I make to any server that also supports the idempotency spec, whether it is transactional or an intermediary idempotency service. (I want progressive enhancement and a unified client protocol regardless of server side implementation)

asbjornu commented 5 months ago

Thank you for writing up those user stories, @jmileham. I think approaching this from the "different layers of protection" perspective is very valuable, and something we should write into the specification itself. I think it would harm the adoption of Idempotency-Key if it was (wrongly) established that the only protection it could give is the weak kind offered by Stripe's implementation.

As an end user of a financial services app…

For this second user story, I think "exactly once delivery" could be mentioned as the pattern it attempts to achieve.

As a user on the web of the future equipped with a web browser implementing the idempotency spec…

Neat idea. I can envision an attribute on the <form> element whose value is automatically stuffed into the Idempotency-Key header upon form submission, as well as explicit support on the fetch() API.

richsalz commented 5 months ago

I think "exactly once delivery" could be mentioned as the pattern it attempts to achieve.

In my non-Chair opinion, this is probably going too far for a simple HTTP header field. We cannot achieve 2PC here.

jmileham commented 5 months ago

For this second user story, I think "exactly once delivery" could be mentioned as the pattern it attempts to achieve.

Yes, agreed! Of course the client must have durable retry capability (or else no guarantee of delivery), and it must get the submission done and receive success response before the server expires its idemptency-key cache entry (or else double delivery). I discuss here how adding a timestamp to the protocol, as OASIS Repeatability does, could eliminate the safety bug.

Neat idea. I can envision an attribute on the
element whose value is automatically stuffed into the Idempotency-Key header upon form submission, as well as explicit support on the fetch() API.

Thanks! I believe (though I haven't gotten that far down the path of thinking it through) that the semantics of HTML forms are sufficiently strong that the browser could manage the idempotency-key header automatically, even (but I'm getting head of this draft's purpose).

jmileham commented 5 months ago

In my non-Chair opinion, this is probably going too far for a simple HTTP header field. We cannot achieve 2PC here.

Yeah, that's very fair as well. I sometimes talk about "exactly once semantic execution" as a target eventual outcome for the limited case of valid requests and well-behaved clients. But network partitions can be forever, so there's really no such thing as exactly once delivery in a distributed system.

jmileham commented 5 months ago

If it helps illustrate what I mean by a well-behaved client, here are the guarantees of Betterment's durable retryable job queue that (barring any critical bugs) would enable happy-path eventual exactly once execution of a mutative request to a 3rd party service in combination with Idempotency-key: https://github.com/Betterment/delayed?tab=readme-ov-file#operational-considerations

asbjornu commented 5 months ago

@richsalz, while Idempotency-Key can't make any guarantees about exactly-once-processing of a request, it can describe behaviors in compliant clients and servers that make it possible to achieve in most circumstances – and I think that's highly desirable.

Even in the worst-case scenario where both the server and client is left in an ambigous, unresolved state, either party won't be any worse off than if the spec had not attempted to achieve at-most-once-processing of requests. Regardless of which protection level you are aiming for, the state would need to be reconciled out-of-band somehow.

As long as out-of-band reconciliation is an exception to the norm (and as I have implemented transactional, distributed idempotency systems myself, I know this to be the case), that's fine. Since exactly-once-processing of requests would require deep integration with the origin server and not just a simple "cache the first response I receive" kind of service discussed in #41, it would lead to fewer out-of-band reconciliations, since the client would be able to automatically retry in a lot more situations.

ietf-wg-httpapi / idempotency

Make explicit the goal of being able to offer idempotency as a standalone intermediary service? #43