asyncio / coroutine support

WSGI apps are currently defined as a blocking abstraction: the existence of the write() callable, and the input callable blocking on reads mean that threads of some sort (whether real or green) are required to wrap a WSGI app.

It would be nice to directly support nonblocking behaviour - asyncio etc - in WSGI apps / middleware.

Some questions raised by this

can this co-exist with the blocking protocol? e.g. do we have keys to look for to use it but the rest the same?
should it deprecate the blocking protocol (or do we drop the blocking protocol in WSGI2 since trollius lets us deliver 2.7 support)?

I am totally against droppoing the "blocking way". Albeit the world is massively moving to non-blocking/async paradigms there are lot of cases where the blocking one is better (as an example CPU intensive apps with critically slow parts). And please do not forget we have a gazillion of heavily-used python blocking modules out-there :)

All of the stack-switching techniques (like greenlets) allow a blocking-like (top-down) development approach, so i am pretty sure we can have both without too much problems (take in account uWSGI and gunicorn support various non-blocking patterns by years albeit written with a blocking approach).

I would probably prefer a single "input gateway" that includes some way to signal EAGAIN, so reading from it could return:

"bytes chunk" -> a body chunk has been read
"" -> end of the stream
_some_kind_of_exception -> retry later (and something else)
_another_exception -> error

So in the 'asyncio' way, reading the body would result in

   try:
        result = wsgi_ng_input.read(8192)
   except _some_form_of_eagain:
        # retry later (back to io loop)
        yield from ...
   except:
        error ...

Obviously this blocking-like approach requires coroutines (or similar), so full-callback based technologies (like plain tornado) could not be adapted easily (but i could be wrong, i am not a tornado expert)

I believe it is a mistake to try and shoehorn an async style of doing things into the existing structure of the blocking WSGI API. The whole issue was thrashed out quite a lot on the WEB-SIG years ago and there was no sensible way found that would allow you to practically share middleware between blocking and non blocking worlds. If you try to have some unified model which tries to do both it is just going to be a mess.

It would be much better to accept this and come up with a proper async friendly API from scratch that operates from the hand off point of a web request. If using a completely async based web server, then the handoff would come direct from the web server when the initial details of a web request have been read.

An alternative would be a hybrid web server, which turns things upside from how things are normally being done. That is, have a normal blocking style web server which is used to dispatch to WSGI applications. The web server could then be configurable to allow select URLs to be bridged/switched to an async API instead of to a WSGI API based handler.

In other words, requests would still be accepted in threads and request details read. This would be to ensure that for the case of WSGI applications you don't greedily accept more connections than you can actually handle, thereby blocking requests when they are queued up as pending, which might otherwise have been able to be handled in another process in a multiprocess web server.

Once the request details are read in the thread, only then would the request be handed off to a separate async based mechanism operating in a distinct common thread for async requests, to then handle and complete the processing of the request.

Such a system which handles requests initially in a blocking world might even allow the decision to defer to an async mechanism to handle the remainder of the request to be done within the WSGI application itself using a bridging mechanism as being described elsewhere. Technically this may for example allow you to do initial request processing in a thread, including blocking operation against database etc, before handing off to an async handler to then stream a response over a long time. Such a initial blocking handler could allow for request content to be read at that point, or not at all, with even the request content handled in the async world. Mixing reading of input across both would though be a bad idea. One also would not allow delegation of a request from async back to blocking either.

This hybrid type of web server and the dynamic control of whether to switch to an async model for the remainder of the request could allow for better control over how resources are used in long lived connections, but still allow blocking code within the same process.

As already mentioned, this combining of blocking and sync, is generally done upside down from the way described. For example, Twisted can delegate requests to be handled as a WSGI application using a thread pool. This would suffer the greedy accept problem in a multiprocess system though.

Whether anyone agrees with this idea or not, is something I have wanted to explore for a few years as an interesting side project.

Now either way, you really want a special async API and not try and mix it with the existing WSGI API. You will have to contemplate in any async support though what to do with the competing camps. Can we rely on everyone migrating to asyncio, or do you have to make allowance for people who want to build off Tornado or Twisted dispatch loops.

I have no problem with anything you are bringing up here. For my own education, though, I always assumed that the "greedy accept" problem could be dealt with by judicious use of the listen() syscall?

On Tue, Oct 14, 2014 at 11:07 PM, Graham Dumpleton <notifications@github.com

wrote:

I believe it is a mistake to try and shoehorn an async style of doing things into the existing structure of the blocking WSGI API. The whole issue was thrashed out quite a lot on the WEB-SIG years ago and there was no sensible way found that would allow you to practically share middleware between blocking and non blocking worlds. If you try to have some unified model which tries to do both it is just going to be a mess.

It would be much better to accept this and come up with a proper async friendly API from scratch that operates from the hand off point of a web request. If using a completely async based web server, then the handoff would come direct from the web server when the initial details of a web request have been read.

An alternative would be a hybrid web server, which turns things upside from how things are normally being done. That is, have a normal blocking style web server which is used to dispatch to WSGI applications. The web server could then be configurable to allow select URLs to be bridged/switched to an async API instead of to a WSGI API based handler.

In other words, requests would still be accepted in threads and request details read. This would be to ensure that for the case of WSGI applications you don't greedily accept more connections than you can actually handle, thereby blocking requests when they are queued up as pending, which might otherwise have been able to be handled in another process in a multiprocess web server.

Once the request details are read in the thread, only then would the request be handed off to a separate async based mechanism operating in a distinct thread to then handle and complete the processing of the request.

Such a system which handles requests initially in a blocking world might even allow the decision to defer to an async mechanism to handle the remainder of the request to be done within the WSGI application itself using a bridging mechanism as being described elsewhere. Technically this may for example allow you to do initial request processing in a thread, including blocking operation against database etc, before handing off to an async handler to then stream a response over a long time. Such a initial blocking handler could allow for request content to be read at that point, or not at all, with even the request content handled in the async world. Mixing reading of input across both would though be a bad idea. One also would not allow delegation of a request from async back to blocking either.

This hybrid type of web server and the dynamic control of whether to switch to an async model for the remainder of the request could allow for better control over how resources are used in long lived connections, but still allow blocking code within the same process.

As already mentioned, this combining of blocking and sync, is generally done upside down from the way described. For example, Twisted can delegate requests to be handled as a WSGI application using a thread pool. This would suffer the greedy accept problem in a multiprocess system though.

Whether anyone agrees with this idea or not, is something I have wanted to explore for a few years as an interesting side project.

Now either way, you really want a special async API and not try and mix it with the existing WSGI API. You will have to contemplate in any async support though what to do with the competing camps. Can we rely on everyone migrating to asyncio, or do you have to make allowance for people who want to build off Tornado or Twisted dispatch loops.

— Reply to this email directly or view it on GitHub https://github.com/python-web-sig/wsgi-ng/issues/12#issuecomment-59161729 .

--Guido van Rossum (python.org/~guido)

You certainly can gate how many socket connections are accepted even in an async framework against some limit on the number of active handlers, but I have never actually seen a Python async framework which actually did it. The ones I have looked at in detail all seem to leave it unlimited. Even if the facility to optionally provide a limit existed, the majority of people aren't likely to even set it as they wouldn't understand why they might want to do it.

Outside of the pure threaded systems which are generally constrained by a small fixed thread pool size used to handle requests, the only place I have seen a limit is in gunicorn with its dynamically created greenlet based gevent or eventlet modes, but that limit is so high, at 1000, that it may as well be unlimited for most people. With your typical web application I would have expected the process to become overloaded before it reached the limit.

My crawler is a client, and not a framework (rather the opposite :-) but it does have a limit on the number of connections it opens -- I needed this for some websites that start returning errors if you hammer them too hards. (Probably because they do have a working limiting system. :-)

On Wednesday, October 15, 2014, Graham Dumpleton notifications@github.com wrote:

You certainly can gate how many socket connections are accepted even in an async framework against some limit on the number of active handlers, but I have never actually seen a Python async framework which actually did it. The ones I have looked at in detail all seem to leave it unlimited. Even if the facility to optionally provide a limit existed, the majority of people aren't likely to even set it as they wouldn't understand why they might want to do it.

Outside of the pure threaded systems which are generally constrained by a small fixed thread pool size used to handle requests, the only place I have seen a limit is in gunicorn with its dynamically created greenlet based gevent or eventlet modes, but that limit is so high, at 1000, that it may as well be unlimited for most people. With your typical web application I would have expected the process to become overloaded before it reached the limit.

— Reply to this email directly or view it on GitHub https://github.com/python-web-sig/wsgi-ng/issues/12#issuecomment-59309265 .

--Guido van Rossum (on iPad)

As a step in exploring what might be the basic requirements for a new async based API for a web application handle HTTP/1 web requests, are there any further high level event primitives that are required than what is described below for dealing with web requests?

For the web application as a whole:

Notification from web server to web application that request is available (headers would have already been read and would be provided).

For a specific web request:

Notification from web server to web request handler that a block of request body content is available (any dechunking would have already have been handled by the web server).
Notification from web server to web request handler that no more request body is available (trailers would be provided if chunked).
Notification from web server to web request handler that the connection was closed (prematurely before all request content read, before response content was all produced, or because of a web server enforced timeout on how long reading request content can take).
Notification from web request handler to the web server that response headers for the web request are available.
Notification from web request handler to the web server that a block of response content is available (the web server has to ensure that response content is written in parallel and this doesn't block).
Notification from web request handler to the web server that no more response content will be produced (trailers would be provided if any if no content length supplied and expectation is that response would be chunked).
Notification from web server to web request handler that all response content had been written to the client.
Notification from web request handler to the web server that the web request handler is finished all processing related to the web request (may include an optional error indication and details if was premature and web request handler was giving up).

Open questions:

Should a web request handler have to provide an explicit notification that it actually intends to consume any request content provided? This is relevant, because if it doesn't and you simply push request content to a web request handler automatically, it doesn't provide a web request handler the opportunity to reject a web request with a 413 (Request Entity Too Large), without having already triggered a 100-continue.
What sort of congestion control notification do you supply from the web server to the web request handler to indicate that the client connection is blocked and that producing more response content should be suspended and a subsequent notification to indicate it can resume. This is in place of very low level socket notifications on whether a socket is ready to write. Such congestion control may need to be combined with some measure of output buffering, with a congested notification only being raised once a certain amount of response content has been queued and with it it being lifted only once outstanding falls below a certain level. Problem is how to tie that in which the web request handler and how it generates the request content.

Note that the intent here is to abstract things away enough that there is no need for a web request handler to know about an actual socket connection directly. Further, any API should be abstract enough that what actual async framework or mechanisms a web request handler then actually uses shouldn't matter. The underlying web server might use the same async framework, or could use its own minimal async framework so long as its operation in the same process isn't in conflict with what the web request handler uses.

There is no intent at this point to try and map this in any way to HTTP/2.

What about taking ideas from Python Web3(Deferred PEP 0444)?

It provides asynchronous execution support by 'callable object'. And just adds 'web3.async' key to environ object for indicating asynchronous execution support status.

def run(application):
    ...
    environ['web3.async'] = False
    rv = application(environ)
    if hasattr(rv, '__call__'):
        raise TypeError('This webserver does not support asynchronous responses.')
    ...

Like this idea, we may can use 'awaitable object' to provide asynchronous support.

For example,

async def run(application):
    ...
    # WSGI server advertises asynchronous availability to the application.
    environ['wsgi.async'] = True
    rv = application(environ, start_response)
    if inspect.isawaitable(rv):
        rv = await rv
    ...

Awaitable object can be asyncio's Future, Tornado's Future or etc...

In this case, we may lose asynchronous support for Python 2.X. But, for synchronous server, it would be totally compatible with Python 2.X.

If you are keen on PEP 444, I would suggest you go and read the WEB-SIG mailing list archives.

python-web-sig / wsgi-ng

asyncio / coroutine support #12