pwsm / httplib2

Automatically exported from code.google.com/p/httplib2
0 stars 0 forks source link

Request that returns 304 is treated as 200, but cached as 304 #97

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Steps to reproduce:

1. I invoke an HTTP request for /foo/, which goes over the wire. The
server serves a representation that includes ETag, Cache-Control, and
Date. (What follows is an edited, simplified version of the httplib2
debug dump.)

send: 'GET /foo/ HTTP/1.1'

reply: 'HTTP/1.0 200 Ok\n'
header: Etag: "etag1"
header: Cache-Control: max-age=2
header: Date: Thu, 08 Apr 2010 18:26:14 -0000
...

httplib2 caches a representation that begins like this:

status: 200
etag: "etag1"
cache-control: max-age=2
date: Thu, 08 Apr 2010 18:26:14 -0000
...

My code that invoked the request gets a response object with .status
== 200.

2. The Cache-Control setting that says the cache is good for 2
seconds. Before that two seconds is up, I make a second request for
/foo/. httplib2 notices that the cached representation is fresh and,
rather than sending a request to the server, serves me a
representation directly from the cache. Again, the status code is 200,
even though no HTTP request was made.

3. Now I wait 3 seconds, and make a third request for /foo/. httplib2
notices that the cached representation is stale and makes a
conditional request.

send: 'GET /foo/ HTTP/1.1
       if-none-match: "etag1"'

reply: 'HTTP/1.0 304 Not Modified\n'
header: Etag: "etag1"
header: Cache-Control: max-age=2
header: Date: Thu, 08 Apr 2010 18:26:17 -0000
...

The ETags match, and so server sends a response code of 304 and no new
representation. Before handing the response object back to the calling
code, httplib2 sets response.status = 200. This way, my calling code
doesn't have to distinguish between the 304 and 200 response
codes--all it cares about is that it got its representation.

This is the point at which I believe there is a bug. Before httplib
sets response.status = 200, it calls _updateCache to update any
headers that changed between the last request and this one. In this
example, the Date header changed, and it's important to keep it
updated so that if another request happens in the next two seconds the
cache will not seem stale. That means that the cache now looks like this:

status: 304
etag: "etag1"
cache-control: max-age=2
date: Thu, 08 Apr 2010 18:26:17 -0000
...

4. Finally, I make a fourth request for /foo/, less than two seconds
after the third request. Again, the cached representation is
determined to be fresh, and instead of an HTTP request going to the
server httplib2 creates a synthetic response object from the
cache. But this response object is different from the previous three.

The first response object had a status code of 200 because that's what
the server said. The second response object had a status code of 200
because that's what the cache said. The third response object had a
status code of 200, even though the cache was stale and the server
said 304, because for the sake of consistency, httplib2 modified the
response code before returning it. But httplib2 didn't modify the
cache for the sake of consistency, and so the fourth response object
has a status code of 304--even though from the end-user's perspective
it's no different from the other three requests.

Analysis:

I don't grok HTTP caching behavior well enough to know if this is
technically correct, but it does seem that if you silently change a 304
response to a 200 when it comes direct from the server, you should do the
same when the 304 response comes from a cache. Setting the response code to
200 *before* the _updateCache call seems to accomplish this: the 304
response code is cached as 200 and subsequent requests treat it as such.

Original issue reported on code.google.com by leonard....@gmail.com on 8 Apr 2010 at 6:59

GoogleCodeExporter commented 8 years ago
Looking at the code in both .4 and .6 I see that the _updateCache() has the 
following:

            status = response_headers.status
            if status == 304:
                status = 200

            status_header = 'status: %d\r\n' % response_headers.status

            header_str = info.as_string()

            header_str = re.sub("\r(?!\n)|(?<!\r)\n", "\r\n", header_str)
            text = "".join([status_header, header_str, content])

            cache.set(cachekey, text)

Notice that after the check of the value of "status", that variable is not 
used. The "status_header" variable string value is generated using the original 
response_header status value.

The simple fix would be to just change that string to be generated using 
"status".

Original comment by peter.a....@gmail.com on 17 Jul 2010 at 8:12

GoogleCodeExporter commented 8 years ago
This issue was closed by revision 46fcb773eb.

Original comment by joe.gregorio@gmail.com on 14 Feb 2011 at 8:36