hitrust / modwsgi

Automatically exported from code.google.com/p/modwsgi
0 stars 0 forks source link

Transparent "Expect: 100-continue" handling #52

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The Expect: 100-continue mechanism is explained in RFC 2616 sections 8.2.3
(http://www.rfc.net/rfc2616.html#s8.2.3) and the WSGI requirements for it
are described in PEP 333
(http://www.python.org/dev/peps/pep-0333/#http-1-1-expect-continue).

The use cases for Expect: 100-continue are primarily web services, where
the client is sending a large request entity to the server, and/or where
the connection between the client and the server is bad, and the client
doesn't want to waste resources by sending an entity when the server can
reject the request without looking at the request entity. Now, many web
service clients 100-continue turned on by default, and developers are
already often unknowingly using this mechanism. 

Ideally, mod_wsgi would not send a "100 Continue" response until either
wsgi.input was read. If the WSGI application returns a response before
wsgi.input is read, then that response should be returned *instead of* 100
Continue. 

PEP 333 says that Expect: 100-continue processing must be transparent to
the application. Does this mean that the Expect header needs to be modified
to remove the "100-continue" token?

I have a WSGI application that benefits from this optimization, but in
reality the optimized case will be pretty rare compared to other cases.
However, the removal of the "100-continue" token is potentially a WSGI
compliance issue.

I will submit some automated tests for this once I have time to write them.

Original issue reported on code.google.com by brianlsm...@gmail.com on 15 Jan 2008 at 8:24

GoogleCodeExporter commented 9 years ago
FWIW, mod_wsgi doesn't do anything about '100-continue'. Whatever behaviour one 
sees is a result of how 
Apache lower levels implement it and possible at handler level one doesn't have 
much control over it.

My understanding up to now was that Apache doesn't send the '100-continue' 
until first attempt by handler to 
read data. Problem is that no browser implements '100-continue' how it is meant 
to and just sends any content 
immediately after the headers anyway.

So, as far as I know Apache does it correctly now.

Original comment by Graham.Dumpleton@gmail.com on 15 Jan 2008 at 11:07

GoogleCodeExporter commented 9 years ago
As followup, mod_wsgi only really honours 100-continue properly when run in 
embedded
mode. When in daemon mode the content is always being sent across to daemon even
before WSGI application gets a chance to process request. As such, the 100 
status
response is sent by Apache back to the client before wsgi.input in WSGI 
application
running in daemon mode had been used.

Will bring this whole issue up on mailing list, as my knowledge on 100-continue
details isn't excellent and nothing I have found yet on the net explains some 
things
I want to know. :-( 

Original comment by Graham.Dumpleton@gmail.com on 29 Jan 2008 at 4:44

GoogleCodeExporter commented 9 years ago
Actually, in terms of what the WSGI specification says:

"""
Servers and gateways that implement HTTP 1.1 must provide transparent support 
for
HTTP 1.1's "expect/continue" mechanism. This may be done in any of several ways:

   1. Respond to requests containing an Expect: 100-continue request with an
immediate "100 Continue" response, and proceed normally.
   2. Proceed with the request normally, but provide the application with a
wsgi.input stream that will send the "100 Continue" response if/when the 
application
first attempts to read from the input stream. The read request must then remain
blocked until the client responds.
   3. Wait until the client decides that the server does not support expect/continue,
and sends the request body on its own. (This is suboptimal, and is not 
recommended.)
"""

When using embedded mode it does 2. When using daemon mode it effectively does 
1.

To my mind doing 2 is the best thing as can avoid the need to send content at 
all.
Thus need to get daemon mode doing 2 as well.

Original comment by Graham.Dumpleton@gmail.com on 29 Jan 2008 at 4:58

GoogleCodeExporter commented 9 years ago
Hmmm, more digging. There are problems with 100-continue with how mod_wsgi 
avoids
buffering output to enforce WSGI requirement to flush between yields or 
iterables.

With the example:

def application(environ, start_response):
    length = int(environ.get('CONTENT_LENGTH', '0'))
    prefix = str(environ) + '\n'

    status = '200 OK'
    response_headers = [('Content-Type', 'text/plain'),
                        ('Content-Length', str(length+len(prefix)))]
    start_response(status, response_headers)

    yield prefix

    block = min(128, length)
    output = environ['wsgi.input'].read(block)
    length -= block
    while output:
        yield output
        output = environ['wsgi.input'].read(block)
        length -= block

In this example it deliberately yields a value before making first attempt to 
use
wsgi.input. The point of this was to test whether the remote client would still 
send
content where final response headers were received before 100 status was 
returned and
actual response status was 200.

What happens is that the 100 status response which is generated by Apache output
filters (and not mod_wsgi), gets inserted into output stream. Ie., for:

grahamd$ curl -vF A=B http://localhost:8224/wsgi/scripts/stream.py

one sees:

* About to connect() to localhost port 8224
*   Trying ::1... * connected
* Connected to localhost (::1) port 8224
> POST /wsgi/scripts/stream.py HTTP/1.1
User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7i
zlib/1.2.3
Host: localhost:8224
Pragma: no-cache
Accept: */*
Content-Length: 137
Expect: 100-continue
Content-Type: multipart/form-data; 
boundary=----------------------------607fea63a6ca

< HTTP/1.1 200 OK
< Date: Tue, 29 Jan 2008 05:32:16 GMT
< Server: Apache/2.2.4 (Unix) mod_wsgi/2.0c4 Python/2.3.5
< Content-Length: 1799
< Content-Type: text/plain
{'mod_wsgi.reload_mechanism': '0', 'mod_wsgi.listener_port': '8224',
'SERVER_SOFTWARE': 'Apache/2.2.4 (Unix) mod_wsgi/2.0c4 Python/2.3.5', 
'SCRIPT_NAME':
'/wsgi/scripts/stream.py', 'mod_wsgi.handler_script': '', 'SERVER_SIGNATURE':
'<address>Apache/2.2.4 (Unix) mod_wsgi/2.0c4 Python/2.3.5 Server at localhost 
Port
8224</address>\n', 'REQUEST_METHOD': 'POST', 'PATH_INFO': '', 'SERVER_PROTOCOL':
'HTTP/1.1', 'QUERY_STRING': '', 'PATH':
'/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/ose/bin:/usr/local/bin:/Users/grahamd/
bin',
'CONTENT_LENGTH': '137', 'HTTP_USER_AGENT': 'curl/7.13.1 
(powerpc-apple-darwin8.0)
libcurl/7.13.1 OpenSSL/0.9.7i zlib/1.2.3', 'SERVER_NAME': 'localhost', 
'REMOTE_ADDR':
'::1', 'wsgi.url_scheme': 'http', 'mod_wsgi.output_buffering': '0',
'mod_wsgi.callable_object': 'application', 'SERVER_PORT': '8224',
'wsgi.multiprocess': True, 'SERVER_ADDR': '::1', 'DOCUMENT_ROOT':
'/usr/local/apache-2.2.4/htdocs', 'mod_wsgi.process_group': '', 'HTTP_PRAGMA':
'no-cache', 'SCRIPT_FILENAME': '/usr/local/wsgi/scripts/stream.py', 
'SERVER_ADMIN':
'you@example.com', 'wsgi.input': <mod_wsgi.Input object at 0x4de480>, 
'HTTP_HOST':
'localhost:8224', 'wsgi.multithread': True, 'HTTP_EXPECT': '100-continue',
'REQUEST_URI': '/wsgi/scripts/stream.py', 'HTTP_ACCEPT': '*/*', 'wsgi.version': 
(1,
0), 'GATEWAY_INTERFACE': 'CGI/1.1', 'wsgi.run_once': False, 'wsgi.errors':
<mod_wsgi.Log object at 0x489170>, 'REMOTE_PORT': '56725', 
'mod_wsgi.listener_host':
'', 'CONTENT_TYPE': 'multipart/form-data;
boundary=----------------------------607fea63a6ca', 
'mod_wsgi.application_group':
'kundalini.local:8224|/wsgi/scripts/stream.py', 'mod_wsgi.script_reloading': 
'1'}
HTTP/1.1 100 Continue

------------------------------607fea63a6ca
Content-Disposition: form-data; name="A"

B
* Connection #0 to host localhost left intact
* Closing connection #0

Note the presence of 'HTTP/1.1 100 Continue' intermixed in response. Apache 
should be
realising that headers have been sent and not generating this. Not sure if 
Apache is
wrong or whether how mod_wsgi uses it is wrong. If one turns on 
WSGIOutputBuffering
in mod_wsgi the problem doesn't exist.

Original comment by Graham.Dumpleton@gmail.com on 29 Jan 2008 at 5:35

GoogleCodeExporter commented 9 years ago
mod_wsgi has to send the response headers once the application yields its first
non-empty string. When you send the headers before calling 
ap_should_client_block,
the meaning is "do not continue;" that is, "do not send a request body. If the 
client
sends a request body anyway, then the server should ignore it, according to
[http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.3 HTTP 1.1 
section 8.2.3]. 

Effectively, the way WSGI is defined, an application must call
environ["wsgi.input"].read() or .readline() at least once before yielding an
iterable, if it wants to read the input at all when an Expect: 100-continue is
provided.  If there is an "100-continue" in the "Expect" header, and the 
application
yields a non-empty string before reading from wsgi.input, mod_wsgi should 
(seemingly
must) disable wsgi.input, preferably by raising an exception whenever the user 
tries
to read from it.

I will bring it up on Web-SIG.

Original comment by brian@briansmith.org on 29 Jan 2008 at 6:17

GoogleCodeExporter commented 9 years ago
Issue in comment 4 about 100 Continue being returned in response was fixed for 
2.0, but still nothing done about comment 5 and whether to force generate 100 
Continue before headers if no attempt made to read input before response is 
generated, this being what was discussed on Web-SIG.

Interestingly, if using Apache 2.2.6, Apache seems to do this automatically. 
Ie.,

 curl -v -F xxx=yyy http://127.0.0.1/~grahamd/echo.wsgi
* About to connect() to 127.0.0.1 port 80 (#0)
*   Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> POST /~grahamd/echo.wsgi HTTP/1.1
> User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 
OpenSSL/0.9.7l zlib/1.2.3
> Host: 127.0.0.1
> Accept: */*
> Content-Length: 141
> Expect: 100-continue
> Content-Type: multipart/form-data; 
boundary=----------------------------6ff45c500399
> 
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Date: Mon, 18 Feb 2008 09:22:29 GMT
< Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.7l DAV/2 
mod_wsgi/2.0c5-TRUNK Python/2.5.1
< Content-Length: 1695
< Content-Type: text/plain
< 
< .....

Where echo.wsgi was:

import StringIO

def application(environ, start_response):
    headers = []
    headers.append(('Content-type', 'text/plain'))

    print >> environ['wsgi.errors'], environ

    #environ['wsgi.input'].read(0)

    start_response('200 OK', headers)

    input = environ['wsgi.input']
    output = StringIO.StringIO()

    keys = environ.keys()
    keys.sort()
    for key in keys:
        print >> output, '%s: %s' % (key, repr(environ[key]))
    print >> output 

    length = int(environ.get('CONTENT_LENGTH', '0'))
    output.write(input.read(length))

    return [output.getvalue()] 

Haven't found in the code yet for Apache 2.2.6 where it is doing this and how 
code is different to Apache 2.2.4 where it doesn't do it. Other possibility is 
my 
Apache 2.2.4 configuration is somehow different. Doesn't do it for Apache 1.3 
either though, so maybe Apache was changed.

Original comment by Graham.Dumpleton@gmail.com on 18 Feb 2008 at 9:28

GoogleCodeExporter commented 9 years ago

Original comment by Graham.Dumpleton@gmail.com on 18 Feb 2008 at 9:33

GoogleCodeExporter commented 9 years ago
Graham, your test application doesn't do what you think it does; it needs to 
yield a
non-empty string to flush the headers before attempting to read from 
wsgi.input. If
Apache had already sent the headers before you read from wsgi.input then it 
would not
be able to set the Content-Length header in the response.

Original comment by brianlsm...@gmail.com on 19 Feb 2008 at 4:49

GoogleCodeExporter commented 9 years ago
See also http://issues.apache.org/bugzilla/show_bug.cgi?id=38014. The handling 
of
"100 Continue" has improved in very recent versions; in particular, the fixed
versions of httpd will no longer incorrectly send a "100 Continue" if headers 
have
already been sent.

Original comment by brianlsm...@gmail.com on 19 Feb 2008 at 4:52

GoogleCodeExporter commented 9 years ago
Okay, that would help. I thought I was going mad. Don't understand then why I 
was
seeing Apache 2.2.4 do something different. Only thing I can think of is that 
curl on
the older MacOS X version where am running Apache 2.2.4 does something 
different.
That or my test program was different, given they were in different boxes. 
Anyway,
know now to go back and do text from scratch.

Are we still more or less agreed though, as per Web-SIG discussions as I 
understood
it,  that for 2xx and 3xx responses should force the 100 continue response if no
input read before start_response(). Presume that it would be forced only at 
time that
headers are flushed. Ie., first call to write() or first non empty value from 
iterable?

Original comment by Graham.Dumpleton@gmail.com on 19 Feb 2008 at 4:56

GoogleCodeExporter commented 9 years ago
As to:

Index: modules/http/http_filters.c
===================================================================
--- modules/http/http_filters.c (revision 512953)
+++ modules/http/http_filters.c (working copy)
@@ -185,7 +185,8 @@
          * Only valid on chunked and C-L bodies where the C-L is > 0. */
         if ((ctx->state == BODY_CHUNK ||
             (ctx->state == BODY_LENGTH && ctx->remaining > 0)) &&
-            f->r->expecting_100 && f->r->proto_num >= HTTP_VERSION(1,1)) {
+            f->r->expecting_100 && f->r->proto_num >= HTTP_VERSION(1,1) &&
+            !(f->r->eos_sent || f->r->bytes_sent)) {
             char *tmp;
             apr_bucket_brigade *bb;

That would indeed fix Apache.

The question is, will my simply setting expecting_100 to false as a way of 
making it
work properly for older versions of Apache as well cause issues with output 
filters.
Frankly can't think of any reason why any other output filter would at that 
point be
interested in expecting_100.

Original comment by Graham.Dumpleton@gmail.com on 19 Feb 2008 at 5:00

GoogleCodeExporter commented 9 years ago
I would send a 100 continue for any status code that isn't 4xx or 5xx, except I 
would
send a 500 Internal Server error if the status code is 100. That is basically 
what
you said, except that it allows for codes > 599 and less than 200.

Original comment by brianlsm...@gmail.com on 19 Feb 2008 at 5:01

GoogleCodeExporter commented 9 years ago
I think it would be better to find out which released version of Apache 
contains the
fix above, and then only do the workaround for versions before that.

Original comment by brianlsm...@gmail.com on 19 Feb 2008 at 5:05

GoogleCodeExporter commented 9 years ago
Send a 500 error just for 100, or any 1xx values. Can a WSGI application validly
generate any 1xx status values?

Original comment by Graham.Dumpleton@gmail.com on 19 Feb 2008 at 5:14

GoogleCodeExporter commented 9 years ago
Other 1xx codes are reasonable (WebDAV's "102 Processing" might even be useful 
for
quite a few WSGI applications), but there is no way for a WSGI application to
generate the next status code (no way to send "102 Processing" and then "200 
OK"), so
I guess it is reasonable to prevent the WSGI application from sending them. 
This is
an issue that should be brought up on Web-SIG.

Original comment by brianlsm...@gmail.com on 19 Feb 2008 at 5:22

GoogleCodeExporter commented 9 years ago
The question in my mind at this point though is whether it really should be
generating this 100 continue if no content read before headers sent.

The problem as I see it is that even if mod_wsgi does this, other WSGI hosting
mechanisms aren't, so no one would be able to rely on it and so to make their
application portable they would have to try a zero length read of wsgi.input 
anyway,
and hope the WSGI hosting solution doesn't optimise away the zero length read 
and
thus not pass it done to the lower layers.

Is there any evidence of any other web framework system, be it Python or some 
other
language, taking this stance and automatically generating a 100 continue even 
if no
content read before data sent in response?

Original comment by Graham.Dumpleton@gmail.com on 20 Feb 2008 at 11:42

GoogleCodeExporter commented 9 years ago
I would be surpised if you could find a WSGI gateway that *doesn't* send "100
continue" when working behind Apache. mod_cgi, mod_fastcgi, mod_fcgi, 
mod_proxy_*,
and even most versions of mod_wsgi will always send a "100 continue" because 
they
read the request body unconditionally before the application even has a chance 
to
send any headers.

What you would actually be doing is providing a useful optimization where you 
*don't*
send an unnecessary "100 continue" for 4xx and 5xx responses, instead of 
sending it
every time. Keep in mind that, technically, you can *always* send a 
100-continue,
even without a "Expect: 100-continue" in the request. 1xx responses are always 
allowed.

Original comment by brian@briansmith.org on 20 Feb 2008 at 11:58

GoogleCodeExporter commented 9 years ago
Paste server appears to try to implement:

"""2. Proceed with the request normally, but provide the application with a
wsgi.input stream that will send the "100 Continue" response if/when the 
application
first attempts to read from the input stream. The read request must then remain
blocked until the client responds."""

I agree that all the others that I have seen do:

"""   1. Respond to requests containing an Expect: 100-continue request with an
immediate "100 Continue" response, and proceed normally."""

Original comment by Graham.Dumpleton@gmail.com on 21 Feb 2008 at 12:06

GoogleCodeExporter commented 9 years ago
If you proxy Paste Server behind Apache, the 100 continue gets sent 
immediately, as
far as I remember.

Original comment by brian@briansmith.org on 21 Feb 2008 at 12:37

GoogleCodeExporter commented 9 years ago
Yes, if mod_proxy is used that is always true no matter what the back end does.

And since Pylons people would most likely say that running Pylons with mod_wsgi,
rather than behind mod_proxy, is evil and therefore forbidden, then one could 
say
that that is the default behaviour for Pylons. ;-)

Original comment by Graham.Dumpleton@gmail.com on 21 Feb 2008 at 12:44

GoogleCodeExporter commented 9 years ago
Am thinking to make this easier to implement that I remove support for
WSGIOutputBuffering. If people want buffering they should do it themselves in 
WSGI
application anyway.

Original comment by Graham.Dumpleton@gmail.com on 21 Feb 2008 at 1:01

GoogleCodeExporter commented 9 years ago
I think that is a good idea. I already had to disable support for 
WSGIOutputBuffering
a long time ago in my modified version, in order to support the
file-descriptor-passing file_wrapper.

Original comment by brian@briansmith.org on 21 Feb 2008 at 1:05

GoogleCodeExporter commented 9 years ago
I didn't strictly need to get rid of output buffering as in the end wouldn't 
have
caused a problem, but have got rid of it anyway. Have updated code to flush '100
Continue' response before headers if required, but just check for 2xx and 3xx 
rather
than doing inverse of 1xx, 4xx and 5xx. If HTTP changes in the future then will 
worry
about additional status response ranges then, but probably highly unlikely that 
will
happen.

Remaining point on this issue is improving on how data is sent across to daemon
process so that can implement 100-continue across that gap and skip sending 
content
to daemon when not required. Although this is an enhancement rather than bug, 
have
left this flagged as bug report for time being. Have flagged the issue now as 
being
for mod_wsgi version 3.0.

Original comment by Graham.Dumpleton@gmail.com on 22 Feb 2008 at 3:20

GoogleCodeExporter commented 9 years ago
That doesn't work if an application is using its own status codes >=600, but I 
guess
anybody that does will be getting what they deserve.

Original comment by brian@briansmith.org on 22 Feb 2008 at 3:59

GoogleCodeExporter commented 9 years ago
Now not likely to be fully addressed in version 3.0.

Original comment by Graham.Dumpleton@gmail.com on 11 Apr 2008 at 5:26

GoogleCodeExporter commented 9 years ago
As discussed in:

  http://groups.google.com/group/modwsgi/browse_frm/thread/815fd4da49951e72

the workaround to trigger a zero length read before generating response headers 
to force the '100 Continue' header to be sent, 
causes an assertion failure when Apache is compiled in maintainer mode.

To limit where this can occur and because '100 Continue' bug fixed in Apache 
2.2.7 (2.2.8 really as 2.2.7 never an official release), 
now only force the zero length read for buggy versions of Apache.

This change was done at revision 1090 in trunk for mod_wsgi 3.0.

Note that a WSGI application could itself still perform a zero length read and 
cause the assertion failure as don't yet ignore a zero 
length read. Still looking at whether should ignore a zero length read and 
whether it is reasonable that a WSGI application could 
expect to do a zero length read and see a '100 Continue' response be generated 
when no non zero length read already done. Could 
perhaps limit this and ignore zero length read if a non zero length read 
already done, as certainly no point in that case.

Original comment by Graham.Dumpleton@gmail.com on 14 Oct 2008 at 11:36

GoogleCodeExporter commented 9 years ago
Now also only allow zero length read to propagate down to input filter stack if 
that zero length read is first read 
from input. Change made in revision 1093.

Original comment by Graham.Dumpleton@gmail.com on 20 Oct 2008 at 7:05