mengzurui / dpkt

Automatically exported from code.google.com/p/dpkt
Other
0 stars 0 forks source link

HTTP responses with no body cause other responses to be consumed #50

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Put more clearly, when an HTTP response, say, 304 Not Modified, has no body but 
still has a content-type header, all data after that in the stream is consumed.

What steps will reproduce the problem?
1. Unpack attached zip file
2. Run dpkt_bug.py, which attempts to construct dpkt.http.Response's with the 
data in the file stream.txt, included.

This program prints the number of responses parsed. There are two responses in 
the file, but only one is detected, with the other response as its body. You 
can see this if you print the responses instead of just the length of the list.

This test was run on Windows Vista with dpkt 1.7.

Original issue reported on code.google.com by andrewf...@gmail.com on 27 Sep 2010 at 10:40

Attachments:

GoogleCodeExporter commented 9 years ago
Please note that 204 No Content, and others have no body either.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.3

"For response messages, whether or not a message-body is included with
a message is dependent on both the request method and the response
status code (section 6.1.1). All responses to the HEAD request method
MUST NOT include a message-body, even though the presence of entity-
header fields might lead one to believe they do. All 1xx
(informational), 204 (no content), and 304 (not modified) responses
MUST NOT include a message-body. All other responses do include a
message-body, although it MAY be of zero length."

Original comment by ls...@google.com on 28 Sep 2010 at 2:32

GoogleCodeExporter commented 9 years ago
Ok. The current issue is being caused by dpkt reading the rest of the data into 
the body if the response header has a content-type header, which is wrong 
according to the RFC. A simple fix is simply to remove those two lines.

Of course, this doesn't solve the larger issue. Making the response 
intelligently decide whether to parse a body is tricky with dpkt's 
architecture, because the parsing of the body and headers is done in the 
http.Message superclass.

Original comment by andrewf...@gmail.com on 28 Sep 2010 at 5:45

GoogleCodeExporter commented 9 years ago
Attached is a patch to fix the issue:
1. Ignore content-type, as it does not say anything about the existence of a 
body.
2. Do not attempt to read body if response status code is 1xx, 204 or 304.
3. Here comes the ugly part: ignore body if parameter head_response=True has 
been passed to Response object. HEAD responses are identical to GET responses 
(including Content-Length), except for the missing body. dpkt can not determine 
automatically whether this is a HEAD response.

Also attached modified dpkt_http_bug2.zip:
- dpkt_bug.py: call dpkt.http.Response(data, head_response=True)
- stream.txt: file ends with '\r\n\r\n' (instead of '\r\n')

Original comment by matthaeu...@gmail.com on 10 Jan 2014 at 5:11

Attachments:

GoogleCodeExporter commented 9 years ago
Slight improvement: set body consistently to '' (empty string), avoid None.

Original comment by matthaeu...@gmail.com on 12 Jan 2014 at 12:32

Attachments: