perusio / drupal-with-nginx

Running Drupal using nginx: an idiosyncratically crafted bleeding edge configuration.
854 stars 246 forks source link

'expires epoch' directive causes upstream cache headers to be ignored #198

Open durist opened 9 years ago

durist commented 9 years ago

apps/drupal/microcache_fcgi.conf has the following:

## The Cache-Control and Expires headers should be delivered untouched
## from the upstream to the client.
fastcgi_ignore_headers Cache-Control Expires;
## Bypass the cache.
fastcgi_cache_bypass $no_cache;
fastcgi_no_cache $no_cache;
## Add a cache miss/hit status header.
add_header X-Micro-Cache $upstream_cache_status;
## To avoid any interaction with the cache control headers we expire
## everything on this location immediately.
expires epoch;

However, the expires epoch line apparently takes precedence over fastcgi_ignore_headers Cache-Control Expires;

Headers with expires epoch;:

HTTP/1.1 200 OK
Server: nginx
Date: Thu, 18 Dec 2014 20:43:48 GMT
Content-Type: application/xml
Connection: keep-alive
Keep-Alive: timeout=10
Vary: Accept-Encoding
Etag: "1418928287-0"
Cache-Control: no-cache
Expires: Thu, 01 Jan 1970 00:00:01 GMT
X-Microcachable: 0
Content-Language: en
Last-Modified: Thu, 18 Dec 2014 18:44:47 GMT
Vary: Cookie
Vary: Accept-Encoding
X-Micro-Cache: EXPIRED
X-Content-Options: nosniff

Identical configuration without expires epoch;:

HTTP/1.1 200 OK
Server: nginx
Date: Thu, 18 Dec 2014 20:44:37 GMT
Content-Type: application/xml
Connection: keep-alive
Keep-Alive: timeout=10
Vary: Accept-Encoding
Etag: "1418928287-0"
Cache-Control: public, max-age=2592000
Expires: Sat, 1 Jan 2015 06:44:48 GMT
X-Microcachable: 0
Content-Language: en
Last-Modified: Thu, 18 Dec 2014 18:44:47 GMT
Vary: Cookie
Vary: Accept-Encoding
X-Micro-Cache: MISS
X-Content-Options: nosniff

It looks like the expires epoch directive is redundant and should be removed.

cclafferty commented 9 years ago

I can confirm that removing 'expires epoch' allows the Cache-Control header specified by Drupal to come through correctly.

HMoen commented 9 years ago

After reading the following, having expires epoch here makes sense for revalidating the actual page using ETags:

In your example, I see expires epoch does affect the headers for content type xml for you, but for me it doesn't. XML files are served directly in drupal.conf by following location directive: location ~* ^.+.(?:css|cur|js|jpe?g|gif|htc|ico|png|html|xml|otf|ttf|eot|woff|svg)$ {

luxpir commented 9 years ago

I'm wondering about this myself now... have commented out expires epoch and sure enough my cache-control headers went from no-cache to public, max-age=x etc.

I've just patched cache_warmer to create a microcache on a single thread as per https://www.drupal.org/node/2362927 and despite a 'headers already sent' error I'm getting from that in drush (deleting microcache manually hasn't resolved that particular issue yet), I'm not at all sure if microcache is being used at this point :)

Any further advice on this issue welcome.

derekjhunt commented 9 years ago

Oddly enough, I'm unsure if microcache is working at all. I've tried it both ways, however, I am always seeing an Expired value on the microcache header.

luxpir commented 8 years ago

I managed to get a cache hit to show using Ctrl+F5, refreshing the page approx. 10 times. It's crude, but it got the cache to show itself in the headers. I knew it must have been working as the blitz.io rush tests were showing nearly 700 hits/sec were not causing any errors or timeouts (with keepalive option on), it was just a case of seeing it for myself. Emulating the rush tests on a mini-scale did it.

cclafferty commented 8 years ago

@luxpir am I right in saying you've just confirmed microcache is working? Is this with or without the expires epoch; option?

I still believe that the expires epoch; option is causing problems and should be removed. Specifically we are looking at it's effect on Cache-Control headers. Has anyone else experienced this?

luxpir commented 8 years ago

@cclafferty Well, I've confirmed I get cache HIT reported in the headers when I hammer the site. This is consistent with how microcaching should work. That's as much as I can say for sure.

This is with expires epoch; enabled.

How's it looking at your end?

cclafferty commented 8 years ago

Thanks for this. This confirms that microcache is working which I suspected was already but your confirmation clears this up. Thank you.

The cache problems we are now faced with are those sent onward to the browser (or proxy server). Drupal is sending correct Cache-Control headers that the expires epoch directive is seemingly overriding. If someone else can confirm this that would be great. Ill post my request/responses for both cases with and without expires epoch so we can confirm. On Mon, 19 Oct 2015 at 10:47, Luke Spear notifications@github.com wrote:

@cclafferty https://github.com/cclafferty Well, I've confirmed I get cache HIT reported in the headers when I hammer the site. This is consistent with how microcaching should work. That's as much as I can say for sure.

This is with expires epoch; enabled.

How's it looking at your end?

— Reply to this email directly or view it on GitHub https://github.com/perusio/drupal-with-nginx/issues/198#issuecomment-149167087 .

luxpir commented 8 years ago

I tested a few things because I read in a few places that:

But then I wondered how useful this is, given that in the marek.sapota.org link posted above, it says about the no-cache response:

"Despite the name it does not stop browser caching but instead it forces browsers to revalidate the cache on each request."

Isn't that exactly what we want from microcaching? Isn't no-cache the correct response in our case?

cclafferty commented 8 years ago

Thanks for this @luxpir. I think the issue here is that we shouldn't be changing Cache-Control headers at all. The problem is still that Drupal is setting them and Nginx is overriding them which is confusing behaviour and shouldn't happen. Even if we were to decide that no-cache was a better response, this change would apply to the Drupal codebase rather than our Nginx config. Do you see what I'm getting at here? It's not our responsibility to mess with Cache-Control headers.

I first noticed this problem whilst using CloudFlare which will not cache any content that specifies no-cache as it's Cache-Control header so as you can imagine this bug was quite frustrating to find. https://support.cloudflare.com/hc/en-us/articles/202775670-How-Do-I-Tell-CloudFlare-What-to-Cache-

perusio commented 8 years ago

@cclafferty the idea of micro caching is to have a small TTL cache so that even if your content changes frequently your site will be able to withstand a great load. If OTOH you want nginx to obey the cache headers you need to comment out the line that sets the expiration to the UNIX Epoch

expires epoch;

Now if you want to force nginx to obey the upstream Cache-Control or Expire headers you need to comment out the line that says it to ignore those headers:

## The Cache-Control and Expires headers should be delivered untouched
## from the upstream to the client.
fastcgi_ignore_headers Cache-Control Expires;
cclafferty commented 8 years ago

Hi @perusio, great project. Let me just say this before you close the issue: the problem here is not microcache. Microcache is working great, it works with a TTL independent of the Cache-Control or Expires headers! It works! The only issue here is that the upstream and upstream only Cache-Control header is being set to 'no-cache'. To fix it you must remove the incorrect expires epoch. This is a bug as you are not effecting microcache but ARE affecting upstream cache eg Browser cache or Proxy server cache. Let me repeat this: Microcache is fine and the only problem is Cache-Control header isnt getting sent correctly to places like your browser.

Now please to close this once and for all observe a Drupal site which I have working. The only thing I had to do to fix it was remove the expires epoch. Really it's a confusing bug with a simple fix:

Please check out: https://www.tyremen.co.uk

You will observe a correct Cache-Control header which Drupal has sent (I have 'Expiration of cached pages' set to 15 minutes) You will also likely see an X-Microcache: MISS. I have a very low TTL on microcache so just refresh a few times quickly and you'll see that the microcache eventually becomes a HIT. Now refresh again (after a few seconds) and you'll once more get MISS. This is microcache doing its job. It's protecting the site from floods of requests, it's saving CPU by not calling any Drupal code and it doesn't care about my Cache-Control headers, they slip through without being looked at or acted on. This to me is correct and expected.

On Mon, 19 Oct 2015 at 15:26, António P. P. Almeida < notifications@github.com> wrote:

@cclafferty https://github.com/cclafferty the idea of micro caching is to have a small TTL cache so that even if your content changes frequently your site will be able to withstand a great load. If OTOH you want nginx to obey the cache headers you need to comment out the line that sets the expiration to the UNIX Epoch

expires epoch;

Now if you want to force nginx to obey the upstream Cache-Control or Expire headers you need to comment out the line that says it to ignore those headers:

The Cache-Control and Expires headers should be delivered untouched## from the upstream to the client.fastcgi_ignore_headers Cache-Control Expires;

— Reply to this email directly or view it on GitHub https://github.com/perusio/drupal-with-nginx/issues/198#issuecomment-149230107 .

perusio commented 8 years ago

@cclafferty I think you're mixing things a little bit. nginx cannot set the upstream headers. nginx can only set client headers. What in fact happens is that by setting the expiration date to the UNIX Epoch we say to the client that the page has already expired, hence the client has to request it all the time, i.e., for all requests.

The idea of microcaching is protecting the backend/upstream. Yes you get another request for a page reload, but it goes to the cache. Only after the microcache validity expires you'll get a fresh page from the upstream. Since the cache manager uses a lock to prevent stampeding you'll get never more than a single upstream request during the time validity of the lock.

If comment out the expires epoch line then the upstream caching headers take effect on the client. For the server is different. That's what you're seeing. I suggest you comment also the fastcgi_ignore_headers directive. It becomes much easier to reason about caches and TTLs. This way the upstream is the master. What you set there will be taken into account in the client and the server.

cclafferty commented 8 years ago

Hi @perusio thanks for your patience on this. I believe we've had a misunderstanding regarding what we'd call upstream. To me I'm talking about the client. You clearly understand what is going on, can you explain why you choose to deliberately expire the cache on the client? Surely the upstream Cache-Control headers from Drupal are more suitable?

perusio commented 8 years ago

@durist the reason is simplicity. It's easier to reason about the caching in your application if you use only one authority.: the server or the upstream. By setting it to the UNIX Epoch there's no interaction between the client and the server. Yes it makes the client hit the cache on the server, but if you decide to change the TTL or for you the freshness of the content is important, then you know exactly when a client will get an updated version of a page: as soon as the cache expires and a new version is fetched from the upstream.

tl;dr - to avoid interactions between client cache and server cache - having a single authority for TTL

cclafferty commented 8 years ago

Hi @perusio just wanted to say thanks for clearing this up. Throughout this entire thread we thought this was a bug or at least some unwanted side effect. Although I may not agree with Nginx being the authority on cache at least this was done by design. I would recommend leaving a comment surrounding the tags expires epoch; to suggest that's what it's doing. Cheers.

blaoch commented 8 years ago

@perusio

Thanks for providing clarification on this - you have an excellent implementation of Nginx.

I just have couple of questions for you.

1) How does microcaching works with drupal caching? Lets say Drupal caching for anonymous users is turned on and a page is cached for 2 hours and Microcaching TTL is 10m. Would that mean Nginx would receive the cached version from Drupal every 10 minutes until it receives updated page after 2 hours Drupal internal cache expires.

2) Is it possible to expire microcache cache bins at the same time internal caches are expired? I am thinking of setting up some PHP code that could be triggered with rules module.

Thanks for your help in advance.