jech / polipo

The Polipo caching HTTP proxy
http://www.pps.jussieu.fr/~jch/software/polipo/
MIT License
1.8k stars 354 forks source link

Resource stored with wrong headers (and gzipped) in diskcache / force-reload in Firefox does not help (always) #42

Closed blueyed closed 9 years ago

blueyed commented 9 years ago

I've started seeing the following error with SourceForge.net pages since switching to polipo yesterday:

> Error: CSS did not load.
This may happen on the first request due to CSS mimetype issues. Try clearing your browser cache and refreshing.

The log (with logLevel=0xff):

Vary header present (Accept-Encoding, User-Agent).
Uncacheable object http://sourceforge.net/p/tmux/tmux-code/ci/master/tree/ (2562)
Superseding object http://sourceforge.net/p/tmux/tmux-code/ci/master/tree/ (200 22492 -1 (none) -> 200 22487 -1 (none))
Superseding object http://fonts.googleapis.com/css?family=Ubuntu:regular (200 192 -1 (none) -> 200 192 -1 (none))
Vary header present (Accept-Encoding).
Superseding object http://consent-st.truste.com/get?name=notice.js&domain=slashdot.org&c=teconsent&text=true (200 4433 -1 (none) -> 200 -1 -1 (none))
Vary header present (Accept-Encoding).
Superseding object http://www.google-analytics.com/ga.js (200 16063 1412297322 (none) -> 200 16063 1412297322 (none))
Vary header present (Accept-Encoding).
Superseding object http://consent-st.truste.com/get?name=cmapi.module.js (200 5415 -1 (none) -> 200 -1 -1 (none))
Unsupported Cache-Control directive post-check -- ignored.
Unsupported Cache-Control directive pre-check -- ignored.
Vary header present (Accept-Encoding).
Uncacheable object http://consent.truste.com/notice?js=1&name=notice.js&domain=slashdot.org&c=teconsent&text=true (606)
Superseding object http://consent.truste.com/notice?js=1&name=notice.js&domain=slashdot.org&c=teconsent&text=true (200 1128 -1 (none) -> 200 1129 -1 (none))
Uncacheable object http://www.google-analytics.com/… (138)
Uncacheable object http://www.google-analytics.com/… (138)
Vary header present (Accept-Encoding).
Superseding object http://consent-st.truste.com/get?name=notice2.js (200 6266 -1 (none) -> 200 -1 -1 (none))
Unsupported Cache-Control directive post-check -- ignored.
Unsupported Cache-Control directive pre-check -- ignored.
Uncacheable object http://consent.truste.com/noticemsg?action=returns&domain=slashdot.org&behavior=expressed&country=de&language=en&rand=0.8105680911821955 (94)
Uncacheable object http://sourceforge.net/log/webtracker/?event_id=…&project=tmux&action_type=git&url=http%3A%2F%2Fsourceforge.net%2Fp%2Ftmux%2Ftmux-code%2Fci%2Fmaster%2Ftree%2F&referer= (2)

My config:

logFile = /var/log/polipo/polipo.log
diskCacheRoot = /var/spool/polipo
proxyPort = 3128
logLevel = 0xFF

disableIndexing = false
disableServersList = false

# max. open file handles
maxDiskEntries = 64

I've tried adding these settings (and restarted polipo), but without luck:

dontTrustVaryETag = true
dontCacheCookies = true
dontCacheRedirects = true

Restarting polipo, purging its disk cache (polipo -x) and forcefully reloading does not help (in Firefox). With Google Chrome forcefully reloading helps however, and fixes the cache?!

This may be related to the order in which the browsers fetch the resources, and how polipo handles them (regarding pipelining etc).

Wiping polipo's diskcache altogether (after stopping it) helps.

It appears to be related/caused by a JavaScript error, because some resource is gzipped, but the headers do not say so:

% wget http://a.fsdn.com/allura/nf/1415138103/_ew_/theme/sftheme/js/sftheme/header.js -S
--2014-11-06 17:42:14--  http://a.fsdn.com/allura/nf/1415138103/_ew_/theme/sftheme/js/sftheme/header.js
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:3128... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 200 OK
  Content-Length: 706
  ETag: "1349976945.13-2193"
  Date: Thu, 06 Nov 2014 16:39:51 GMT
  Last-Modified: Thu, 11 Oct 2012 17:35:45 GMT
  Expires: Wed, 04 Nov 2015 21:59:31 GMT
  Cache-Control: public, max-age=31382380
  Content-Type: application/x-javascript
  Age: 143
  Connection: keep-alive
Length: 706 [application/x-javascript]
Saving to: ‘header.js.1’

2014-11-06 17:42:14 (42,9 MB/s) - ‘header.js.1’ saved [706/706]

% http_proxy= wget http://a.fsdn.com/allura/nf/1415138103/_ew_/theme/sftheme/js/sftheme/header.js -S
--2014-11-06 17:42:36--  http://a.fsdn.com/allura/nf/1415138103/_ew_/theme/sftheme/js/sftheme/header.js
Resolving a.fsdn.com (a.fsdn.com)... 23.63.115.172
Connecting to a.fsdn.com (a.fsdn.com)|23.63.115.172|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Server: nginx
  Content-Type: application/x-javascript
  Access-Control-Allow-Origin: *
  ETag: "1360348568.16-2193"
  Last-Modified: Fri, 08 Feb 2013 18:36:08 GMT
  X-Frame-Options: SAMEORIGIN
  Cache-Control: public, max-age=31382270
  Expires: Wed, 04 Nov 2015 22:00:26 GMT
  Date: Thu, 06 Nov 2014 16:42:36 GMT
  Content-Length: 2193
  Connection: keep-alive
Length: 2193 (2,1K) [application/x-javascript]
Saving to: ‘header.js.2’

% file header.js.*
header.js.1: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT)
header.js.2: ASCII text

The file in polipo's disk cache (a.fsdn.com/+pdEifTwTC6+549yOVD+3g==) looks as follows:

HTTP/1.1 200 OK
Content-Length: 706
ETag: "1349976945.13-2193"
Date: Thu, 06 Nov 2014 16:39:51 GMT
Last-Modified: Thu, 11 Oct 2012 17:35:45 GMT
Expires: Wed, 04 Nov 2015 21:59:31 GMT
Cache-Control: public, max-age=31382380
Content-Type: application/x-javascript
X-Polipo-Location: http://a.fsdn.com/allura/nf/1415138103/_ew_/theme/sftheme/js/sftheme/header.js
X-Polipo-Access: Thu, 06 Nov 2014 16:42:14 GMT

^_<8B>^H^@^@^@^@^@^@^@<9D…

See issue #43 for more investigation into this particular resource.

I am using polipo from Git (@e40fbbe7).

Could this be causes by pipelining or something similar, where polipo receives the content gzipped, although it's not when requested alone?!

jech commented 9 years ago

This looks like a server bug — the resource is marked as cachable and there's no Vary header.