mysociety / popit

DEPRECATED - Development on PopIt has stopped and it is no longer being maintained
https://goo.gl/Vvej4Q
Other
76 stars 33 forks source link

images from the image proxy aren't being cached #813

Closed mhl closed 9 years ago

mhl commented 9 years ago

If you make a request to the image proxy, e.g.:

curl -v -O https://yournextmp.popit.mysociety.org/image-proxy/http%3A%2F%2Fyournextmp.popit.mysociety.org%2Fpersons%2F4493%2Fimage%2F5481e8f6b150e238702c061b/0/64

And do it again shortly afterwards, the requests to the image proxy are all varnish misses:

x.x.x.x - - [30/Mar/2015:22:07:55 +0100] "GET http://yournextmp.popit.mysociety.org/persons/4493/image/5481e8f6b150e238702c061b HTTP/1.1" 302 116 "-" "image-proxy/0.0.4" yournextmp.popit.mysociety.org miss
x.x.x.x - - [30/Mar/2015:22:07:55 +0100] "GET http://yournextmp.popit.mysociety.org/persons/4493/image/5481e8f6b150e238702c061b HTTP/1.0" 200 334049 "-" "image-proxy/0.0.4" yournextmp.popit.mysociety.org miss
x.x.x.x - - [30/Mar/2015:22:07:52 +0100] "GET http://yournextmp.popit.mysocito tety.org/image-proxy/http%3A%2F%2Fyournextmp.popit.mysociety.org%2Fpersons%2F4493%2Fimage%2F5481e8f6b150e238702c061b/0/64 HTTP/1.0" 200 11627 "-" "curl/7.35.0" yournextmp.popit.mysociety.org miss
x.x.x.x - - [30/Mar/2015:22:08:43 +0100] "GET http://yournextmp.popit.mysociety.org/persons/4493/image/5481e8f6b150e238702c061b HTTP/1.1" 302 116 "-" "image-proxy/0.0.4" yournextmp.popit.mysociety.org miss
x.x.x.x - - [30/Mar/2015:22:08:43 +0100] "GET http://yournextmp.popit.mysociety.org/persons/4493/image/5481e8f6b150e238702c061b HTTP/1.0" 200 334049 "-" "image-proxy/0.0.4" yournextmp.popit.mysociety.org miss
x.x.x.x - - [30/Mar/2015:22:08:41 +0100] "GET http://yournextmp.popit.mysociety.org/image-proxy/http%3A%2F%2Fyournextmp.popit.mysociety.org%2Fpersons%2F4493%2Fimage%2F5481e8f6b150e238702c061b/0/64 HTTP/1.0" 200 11627 "-" "curl/7.35.0" yournextmp.popit.mysociety.org miss

The implications for this are pretty horrible for pluto: the image-proxy does no caching of resized images itself (as @chrismytton explained in IRC), it just sets a max-age via Cache-Control to 1 year, and relies on Varnish to cache the results - since these requests are all Varnish misses, I assume this means that each image is being resized anew on every request. (!)

pluto has been very loaded recently (responding particularly slowly on 2015-03-30) and I suspect that this may be a large part of the problem.

One issue may be that any request for an image from the image proxy sets a cookie for .popit.mysociety.org:

Set-Cookie: connect.sess=XXXXX; Domain=.popit.mysociety.org; Path=/; HttpOnly; Secure

... which I think comes from Express's cookieSession middleware.

So I guess this might be causing the cache misses? (However, I would have thought that would only the case if the server said Vary: Cookie, and in fact this response just has Vary: X-HTTP-Method-Override. Maybe our Varnish configuration doesn't cache any request with a non-empty Cookie header after the Google Analytics and other known cookies have been stripped?)

dracos commented 9 years ago

Our Varnish config will pass any request with a non-empty cookie header, correct. (This is also default Varnish behaviour: https://www.varnish-cache.org/docs/3.0/tutorial/cookies.html).

Note, in case you didn't know, if the image URLs had ended in a normal extension (jpg, png, etc), our Varnish config would have ignored cookies on them (caching or looking up) and cached them automatically.

mhl commented 9 years ago

@dracos Ah, OK, thanks - coincidentally, James added support for adding an image extension to the URL to image-proxy: https://github.com/jpmckinney/image-proxy/issues/12

chrismytton commented 9 years ago

Fixed at the server end by adding the following rule to our varnish config:

# If hitting PopIt image-proxy, ignore cookies
if (req.http.host ~ "popit.mysociety.org" && req.url ~ "^/image-proxy") {
    remove beresp.http.set-Cookie;
    return (deliver);
}