keycdn / cache-enabler

A lightweight caching plugin for WordPress that makes your website faster by generating static HTML files.
https://wordpress.org/plugins/cache-enabler/
123 stars 46 forks source link

Query string cache policy #148

Closed nlemoine closed 3 years ago

nlemoine commented 3 years ago

Hi!

I noticed that cache policy regarding query strings changed in 1.5 (#129).

This change was made because in most cases the cache should not be bypassed if a query string is present

I totally disagree with that statement and I think this is a terrible decision.

I really hope I could change your mind on this.

coreykn commented 3 years ago

While I appreciate your feedback for the most part, I completely disagree. 🙂 The exact same argument can be made for not bypassing query strings... There are so many unique query strings that are set, like through social media platforms or systems to be used by analytic platforms. It's nearly impossible to know all of the query strings that should not bypass the cache. Does the majority of WordPress installation have dynamic page content based on a query string? Not that I'm aware of. Does WordPress have core actions that rely on query strings in the same way (that we're not already covering)? Not that I'm aware of. Please correct me if I'm wrong. For cache busting with query strings I don't think that's applicable in this case as I'm not aware of this being used for pages (only for CSS, JavaScript, etc.).

As having worked with thousands of KeyCDN customers over the last few years on web performance (with a lot of page caching configurations) I don't believe a page should be bypassed by default if it has a query string. This also helped align Cache Enabler with KeyCDN's defaults, a progress being made for exciting things to come in the future. The exact same behavior as before can easily introduced through a simple regular expression like the following:

/^(?!(fbclid|utm_(source|medium|campaign|term|content))).+$/

Or, even one step further with /.+/ to exclude all query strings.

We can add a list of known query strings and/or parameters, but a better idea would be encouraging plugins (if not already done) to make use of known ways to force page caching plugins to bypass the page cache (e.g. like DONOTCACHEPAGE constant or the Cache-Control request header, which I plan to get to here shortly). In my opinion a plugin should be responsible (or at least try) to make a request that will reach the server. I'm of course not fully aware of all the plugins and what they do, but I'd be surprised if this is not already done. I'll do my best to find what may needs to be done on our end though.

I stand by my decision that has been made with the current information at hand and believe this will benefit the majority of users (especially with increasing the cache hit ratio). However, if I end up being wrong we can easily make the default query string exclusion be the expression above or something similar to achieve the old default behavior again.

BWBama85 commented 3 years ago

What is the status of 1.5? With Cloudflares APO release and updated plugin, it is becoming tedious to use both as Cloudflare clears the edge cache when you update a post but you have to manually clear the Cache Enabler cache with each page update so that Cloudflare doesn't just recache stale content again.

nlemoine commented 3 years ago

You're absolutely right about WordPress core, it doesn't rely on query strings. But some plugins do.

I'm of course not fully aware of all the plugins and what they do, but I'd be surprised if this is not already done

I've been working for more than 10 years in the WordPress world. Although it's (very slowly) getting better, you can't rely on plugins developers skills. Some do a fantastic job, some really don't. Counting on this is a bit dangerous IMO.

The exact same argument can be made for not bypassing query strings...

Sorry, it's not the same for a crucial point. If you bypass "functional" query strings (e.g. that are meant to show a different content), you are showing the wrong page to the user and it's a bug. If you don't bypass the cache for some unknown "tracking" query strings, sure, you don't hit the cache but you're still serving the right page (a bit slower) to the user.

I guess the main reason why you changed your policy regarding query strings is about social/analytics query strings. Those query strings have indeed no effect on the page content but you're taking the only example where query strings are not used to show dynamic content, "functional" query strings do.

Even if bugs affect a small amount of users, I think this a pretty risky way to "only" increase the cache hit ratio. Content integrity should have a higher priority than cache hit ratio for tracking matters.

Anyway, thanks for taking the time to debate this and provide some workaround regex! It's very much appreciated 😉

centminmod commented 3 years ago

Well I guess Wordfence is one of the ones incompatible with cache all query strings https://wordpress.org/support/topic/problem-with-js-wordfence/#post-13531083. Maybe if there's a way to group query strings on front end user requested versus query strings backend/wp-admin requested so that the latter is excluded from cache for backend/wp-admin operations ?

centminmod commented 3 years ago

Another plugin which will have issues is WPML with queries like https://domain.com/?lang=en&s=s and https://domain.com/?lang=de&s=s

centminmod commented 3 years ago

@coreykn actually this breaks Wordpress native searches too with ?s=searchterms as it now only returns the cached Wordpress index page

domain=domain.com
curl -s https://$domain/ | tail -1
curl -s https://$domain/?s=first | tail -1

both curl entries return same

/body> </html> <!-- Cache Enabler by KeyCDN @ 14.10.2020 16:40:04 (https gzip) -->
coreykn commented 3 years ago

A list of known query strings like that are being put together for this. 🙂

centminmod commented 3 years ago

Yeah first one would be search exclusion but that defeats attempts at https://github.com/keycdn/cache-enabler/issues/83 heh

/^s$/

edit: though with your 1.5.1 changes no need to edit class PHP file and still allow pretty search URL redirects to work for advanced nginx caching at https://github.com/centminmod/pretty-search-url so ?s=oct will redirect to /search/oct but Cache Enabler doesn't cache /search/oct right now

coreykn commented 3 years ago

With time I knew we'd be able to get some good data in to make this handling more efficient. So far it's going well. Anyone please feel free to add known examples below for us to consider in this upcoming improvement.

centminmod commented 3 years ago

well tried to exclude search with /^s$/ and doesn't seem to always work?

centminmod commented 3 years ago

With time I knew we'd be able to get some good data in to make this handling more efficient. So far it's going well. Anyone please feel free to add known examples below for us to consider in this upcoming improvement.

seems like it would be easier to compile a list of query strings to include in cache <1.5.1 behaviour versus compiling a list of query strings to exclude from cache >1.5.1

for my Cache Enabler write up at pre 1.5.1 https://servermanager.guide/203/wordpress-cache-enabler-advanced-full-page-caching-guide/ query strings to include in cache was easier

/^fbclid|ref|mc_(cid|eid)|utm_(source|medium|campaign|term|content|expid)|gclid|fb_(action_ids|action_types|source)|age-verified|ao_noptimize|usqp|cn-reloaded|_ga|_ke$/

coreykn commented 3 years ago

It's checking the entire query string and not the query parameter, so /^s=.+$/ should do the trick.

I disagree, at least with the data I have available so far. If you wish to have the old behavior it's still there, just adjust your regular expression with a negative lookahead, like in the example I posted above.

Anyone feel free to provide examples that should exclude the cache and if possible, the plugin(s) that use them. That would be helpful for me in gathering more data to improve this. I'm not necessarily trying to come up with a default list to add as the default, but trying to gather more data where query strings are used to deliver unique objects. That will help in decisions made around this.

centminmod commented 3 years ago

Maybe allow either method to be selectable, cache all query strings from 1.5.1 or select exclude select query strings from <1.4.9. Right now there's to much troubleshooting for 1.5.1 to be production ready as I am auto installing and configuring Cache Enabler in my Centmin Mod LEMP's stack's Wordpress auto installer and I can't expect new Centmin Mod users to know they have to set specific query strings to be excluded from cache in 1.5.1. I have 3,000+ new Centmin Mod installs every month so that is a lot of folks who will run into this issue ! Only a small fraction of Centmin Mod users frequent the support forums https://community.centminmod.com/ so a lot of new users will just strike it up as Cache Enabler in Centmin Mod not working and move on.

nlemoine commented 3 years ago

Maybe allow either method to be selectable

That would be a great workaround to allow users to choose. With the pre 1.5 policy by default and a warning message on the post 1.5 policy.

You can't expect regular users (some probably don't even know what a query string is) to know how to exclude query strings. Further more, you can't expect them to write regex.

nlemoine commented 3 years ago

I had trouble with https://fr.wordpress.org/plugins/goodbye-captcha/

I went for excluding all query strings for now.

coreykn commented 3 years ago

@nlemoine Thank you for your feedback, it's sincerely appreciated. All of it will be considered and the more data I get the better. I've worked on moving away from all blanketed approaches, though. This takes a little time and data to get tuned but is worth it in the long run. As mentioned above, the exact same behavior as before can be achieved with the regular expression posted above. Cache Enabler will be able to be configured with WP-CLI here shortly, so if it's part of your process hopefully that would help make any automatic configurations.

@nlemoine I don't expect regular users to do that... 😉 It's something I'm fully aware of and is actually on my list of todos to improve.

centminmod commented 3 years ago

For now I am just rolling back to 1.4.9 for my users as they're having other issues I haven't been able to diagnose as yet with update to 1.5.1 https://community.centminmod.com/threads/cache-is-not-created-or-served.20542/page-2#post-86659

centminmod commented 3 years ago

It's checking the entire query string and not the query parameter, so /^s=.+$/ should do the trick.

FYI, the 1.5.1 settings example query string regex for /^nocache$/ suggests otherwise though so might need updating the example

coreykn commented 3 years ago

@centminmod The example would cover https://www.example.com/about-us/?nocache. It's just an example and of course can be adjusted if ever needed. Thanks for the regular expression provided earlier! That was really helpful. A new default query string exclusion based on that has been added in the PR above. 🙂