servers are prohibited from transmitting headers until body is available

this doesn't make sense for websockets over HTTP/2, nor even for regular HTTP/2 objects since the headers are in separate frames regardless (and we don't have to worry about HOL blocking). We need to either limit this for HTTP/1.x or possibly remove altogether.

"However, the start_response callable must not actually transmit the response headers. Instead, it must store them for the server or gateway to transmit only after the first iteration of the application return value that yields a non-empty bytestring, or upon the application's first invocation of the write() callable. In other words, response headers must not be sent until there is actual body data available, or until the application's returned iterable is exhausted. (The only possible exception to this rule is if the response headers explicitly include a Content-Length of zero.) "

The goal appears to be to ensure that apps can raise an error page successfully, but since streaming bodies will often error after they begin transmitting, this isn't a guarantee we can make.

Some options: We could remove the constraint, and applications not to call start_response until they know they won't fail - e.g. generate their content first.

We could remove the 'non-empty' clause on bytestring, allowing applications that want to just wait for client content to yield b'' to flush the server headers, establishing a bidirectional connection. This would preserve the ease of use for app authors ('call start_response, then call it again on error') without needing special handling, but also allow e.g. websockets - at the cost of some cognitive overhead.

The problem seems to me to be in mapping the HTTP/2 and HTTP/1.1 semantics. In HTTP/2 it's perfectly reasonable to send no response body without wanting to set Content-Length: 0: this maps to a HEADERS-CONTINUATION sequence with the HEADERS carrying the END_STREAM flag and the last HEADERS/CONTINUATION frame carrying END_HEADERS.

However, in HTTP/1.1 such a situation is extremely unreasonable.

We could map around it, by saying that if you do this in HTTP/1.1 then it automatically applies Transfer-Encoding: chunked and sends a body of one zero-length terminating chunk. We could even have the mapping be direct (instead of using the above HEADERS/CONTINUATION mess we could say that you send HEADERS+CONTINUATION, followed by one empty DATA frame bearing END_STREAM).

I don't really love either of those proposals. For me, the crux of the issue is that I don't believe you can meaningfully guarantee you won't fail while returning the response body: for example, you could throw an exception. If you do, HTTP/2 allows for sending RST_STREAM with an error code in it. HTTP/1.1 has no option but to tear the connection down. I don't know that we should bend over backwards for this case.

I may be misunderstanding something. In HTTP/1.1 chunking is required and preferred for streaming content (its obviously not preferred for transferring things where the length truely is known in advance). Sending a empty body via a chunked response is entirely reasonable in the HTTP/1.1 spec, at least as I read it.

That said, I don't think it matters whether its reasonable or not in HTTP/1.1: I agree that the crux of the issue is that we may fail at any point in streaming responses, and that we should be encouraging streaming responses because of their better HTTP/2 behaviour. The putative benefits of having buffered the headers is that appservers / frameworks can include a pretty error message if the error is trapped before the headers are put on the wire. That can be accomplished one of two ways.

A) we require the app to buffer the headers itself. B) we buffer the headers after being given them until we have content given to us.

Since we don't offer an API to build the headers up incrementally we already are requiring applications to buffer the headers. So it seems to me that we don't make anything materially harder for app writers if we makestart_response be required to flush headers immediately (in the same 'no buffering' model the body handling has)- which will fix this issue.

I tentatively agree that we can flush headers immediately. I'd like to get input from app developers if possible, and we should certainly highlight that it's a divergence from the current behaviour.

Ignoring HTTP/2 for the time being, in all my experience with WSGI servers and applications, I don't recollect seeing any actual framework or application which explicitly took advantage of the ability to call start_response() more than once in order to override the HTTP response. I wouldn't be surprised if this existed solely to support some legacy behaviour in a long gone application from before WSGI existed and is totally unused now.

The only thing therefore that delaying of sending the response headers until the first non empty block achieves is that it gives the actual WSGI server the opportunity to return a HTTP 500 response if an unexpected exception was raised in between start_response() being called and the yield of the first non empty block of data. In this case though it is going to be a generic non descript HTTP 500 response, such as might be generated by Apache, unless Apache happened to have ErrorDocument specified to map to a custom error page.

For this narrow window, we need to look at what the consequences would be of not delaying the sending of the response headers.

In the case of a response where a non zero content length was specified, then a HTTP client would see an empty response but one with a content length. In the case of a browser it would just show a blank page if it was the primary page, but if it was a HTTP client which took more notice of the response, it can detect that a problem occurred even though the HTTP response is 200, by virtue of the fact that the amount of specified content wasn't returned.

The problem is where the content length was set but 0. A client in this case would surmise that the request had been successful, where in fact crashing between calling start_response() and the end of the iterable being reached may cause the requested operation to have not occurred. Realistically though, if you weren't returning any content, it is more likely that all the work had already been done by the WSGI application already and one is generating a 20X response other than 200 or a 30X response where response content wasn't actually being generated.

Now in the case where there is no content length response header, HTTP/1.1 is used and a client accepts chunked transfer encoding for responses, then the correct behaviour would be that the server should simply close the connection without returning anything. That is, it should not even generate a 0 closing chunk.

Whether HTTP clients always provide a way of knowing that the chunked response was well formed and complete I have no idea. Technically though, if there is no 0 closing chunk, then it can detect that it is likely that the request failed in some way before the response was returned. My understanding of proxies is that they are meant to preserve the fact that a downstream system didn't provide a 0 closing chunk, so the original HTTP client still knows.

This latter case in practice falls apart for various reasons though and can't be relied upon, but this is a general problem with chunked response content even where some data has already been returned at the point a problem occurs.

The mod_wsgi module in particular suffers from this problem in that if an exception occurs in the WSGI application while returning a response which is being chunked, Apache is still sending a 0 closing chunk even though the response is incomplete. This is a bug but is fixed for mod_wsgi embedded mode in a yet to be released version.

The case of mod_wsgi daemon mode is more problematic though, as is any system which involves proxying such as CGI or SCGI, where there is no proper framing on the response content when no content length is specified. This is because a proxying process can't reliably know if the complete response was received from the backend process when no content length is set and the backend process happened to crash for some reason or the connection was otherwise dropped prematurely, such as when an exception occurs in the WSGI application.

To solve this in mod_wsgi daemon mode, framing of response data, by using chunked transfer encoding or some other mechanism, between the Apache child worker (proxy) process and the mod_wsgi daemon mode process will be required. There is an intent to fix this and I have been looking at it lately, but haven't had a chance to complete it.

So the WSGI server itself does loose a small additional opportunity to flag that the request actually failed during that window, but this doesn't help with the case where an exception occurs after the first non empty block is yielded anyway and for the latter, you potentially aren't going to be able to reliably detect otherwise that a failure occurred where content length was set to 0 or if content length wasn't set at all and a chunked response was being returned. The latter being due to limitations arising from how some protocols for bridging between a web server and dynamic web application work.

Given that the ability to call start_response() a second time is not something I have never seen used in practice, this small loss in ability of a WSGI server to flag a HTTP 500 response for an unexpected exception in that small window is probably acceptable and not going to cause any real noticeable problems. You also have to remember that the expectation is that people are writing reliable web applications and so the occurrence of an exception should be a rare event in the first place.

Now how all this relates to running a legacy WSGI 1.0 application on top of some future HTTP/2 API using an adapter I have no idea.

What I don't expect to see though is being able to ride HTTP/2 or web sockets on top of a WSGI 1.0 API or even some form of update of it, so am quite confused a bit by why this is even an issue.

So as far as I am concerned this whole discussion is only really relevant in the context of some updated variant of WSGI done simply to eliminate some of the odd corner cases that have to be dealt with in middleware, of which there are many more besides this one if you want to try and simplify things but still have existing code be compatible. I can't see how this has anything to do with HTTP/2 or web sockets as I can see these operating through an existing WSGI API anyway as can't see it as possible within the current confines of the WSGI API and how a underlying WSGI server implements it.

python-web-sig / wsgi-ng

servers are prohibited from transmitting headers until body is available #4