twisted / klein

werkzeug + twisted.web
Other
836 stars 120 forks source link

Why do client socket resets cause pending Deferred to be cancelled? #546

Open KentShikama opened 2 years ago

KentShikama commented 2 years ago

Imagine you have a route like the following.

    @app.route("/foo", methods=["POST"])
    @inlineCallbacks
    def doStuff(self, request):
        yield foo()
        yield bar()
        return {"message": "success"}

And then a client calls POST /foo and disconnects right after Klein finishes processing foo(). Currently, bar() would get cancelled. I understand that the socket has closed so that the {"message": "success"} cannot be written back but it would be nice if the Deferred bar() was executed. Is there some use case that I'm failing to understand here?

Or to put it differently, I believe we should stop executing the .cancel() on the Deferred if the request finishes at https://github.com/twisted/klein/blob/8964da72a7b526aed60746fe8522dadf9ae08ad6/src/klein/_resource.py#L210.

radifalco commented 2 years ago

Agree that it seems odd for a socket reset by the client to cause all the deferreds associated with the call to be canceled. I'd expect the only impact to be that I cannot write a response on the closed socket.

glyph commented 1 year ago

If your route is doing a bunch of expensive computation for a client that has gone away, it's wasting resources. The cancellation is how you find out that the place you're writing a response to is gone, so you can stop doing the work to produce the response. The default, I feel, makes sense here.

If, at the application level, you have some logic that you want to run to completion regardless of the client being available, you can shield your Deferreds from cancellation. This is tricky enough that we should probably include something in Twisted proper that functions like asyncio.shield, but for reference, here's how such a thing is implemented:

from functools import wraps
from typing import Callable, ParamSpec, TypeVar

from twisted.internet.defer import Deferred

T = TypeVar("T")
NoCancel = object()

def shield(d: Deferred[T]) -> Deferred[T]:
    currentWrapped: Deferred[T]
    def rewrap() -> Deferred[T]:
        nonlocal currentWrapped
        currentWrapped = Deferred(lambda it: it.callback(NoCancel)).addCallback(cancelBlock)
        return currentWrapped

    def cancelBlock(result: object) -> Deferred[T]:
        if result is not NoCancel:
            return result
        return rewrap()

    currentWrapped = rewrap()

    d.addBoth(lambda anything: currentWrapped.callback(anything))
    return currentWrapped

If you're using @inlineCallbacks, you can then annotate your routes with something like this:

P = ParamSpec("P")

def shielded(decoratee: Callable[P, Deferred[T]]) -> Callable[P, Deferred[T]]:
    @wraps(decoratee)
    def decorated(*args: P.args, **kwargs: P.kwargs) -> Deferred[T]:
        return shield(decoratee(*args, **kwargs))
    return decorated

i.e.

    @app.route("/foo", methods=["POST"])
    @shielded
    @inlineCallbacks
    def doStuff(self, request):
        yield foo()
        yield bar()
        return {"message": "success"}

If your entire application is routes like this (everything wants to make a best-effort attempt to run to completion, nothing cares if clients are still listening) then I would be open to having this be Klein-instance-level configurable, perhaps with a .route-level customization keyword parameter. i.e. you could say app = Klein(canceling=False) and then @app.route(..., canceling=True) if only one or two routes care about this.

If we remove the .cancel() entirely, then applications which do want to know about clients which have idled out cannot find out of them and may do lots of unnecessary work. This is particularly relevant in the case of something like a file-transfer application which may need to prepare large blobs to transfer.

KentShikama commented 1 year ago

@glyph Thanks for your detailed response! We've currently found a workaround by overriding KleinResource with our one line change, so that this non-cancelling behavior applies to all our routes. Of course, it would be nice if this override could be replaced by a canceling=False flag as you suggested, as it would be much cleaner.

I disagree that cancelling should be the default behavior. My understanding is that Klein is a library that competes with the likes of Flask or FastAPI, the former of which is not async and the latter of which is non-cancelling by default. I understand that for reads (aka GETs) that cancelling is an optimization. But for writes like POST and PUT, the main goal is often the side-effect of the deferreds and not the final response. It becomes particularly tricky to reason about a client that resets and retries a POST route. Without cancellation, either everything needs to be retried or nothing needs to be retried. With cancellation, if 2 out of 3 deferreds finish before the client resets, then if the client retries, you can't just retry all 3 deferreds as that would duplicate the first 2 (unless they are idempotent). In short, it becomes much harder to build a robust route. Depending on the route, it may be unnecessary to complete all deferreds, but it is almost never "wrong". But a naive consumer of this library that assumed non-cancelling might be surprised only after millions of requests - after all client resets are relatively unusual and client resets during "meaningful" periods can be even rarer. We've officially tallied over 40 human hours of debugging pain across our organization for related issues. Of course it depends on what kind of application you're building, but for the ones I've been involved in 99% of routes don't need to be optimized at all and even less need to be optimized for client resets. I hope you reconsider, but at the end of the day this isn't our library, and I totally understand if the applications you're dealing with work better with the cancelling default.

glyph commented 1 year ago

@KentShikama You make an interesting case here. I am not totally sure I agree with all of your reasoning here (cancellation remains important to provide a mechanism for defense against e.g. tarpit attacks), but there's clearly a use-case to be supported here, and I think my initial response was indeed too simplistic.