Closed coreykn closed 3 years ago
Hi @coreykn,
Just tested this last PR. The new approach on generating cache files is really brilliant and more concise.
However, it has a major downside when using it with advanced rewrite rules (#176). Take the following scenario:
/page/
/page/https-index.html
file is created in the cache directoryThank you! That's great you've been able to test this PR, @nlemoine. That is always helpful and appreciated as it allows further testing and feedback to be received before a new version is released.
I agree that is a current downside, however, the advanced configuration that allows the server to deliver the cached files, bypassing PHP entirely, can just be updated to account for this change. I plan on doing this in the docs with eventually automatically generating the snippet based on the settings, like requested in #176. The idea would be the new server configuration is actually quite similar to how the plugin itself generates the cache file path, where certain environment variables are set as cache keys depending on what is accepted. If the server doesn't find the file it would continue to PHP allowing WordPress to start, which would then allow Cache Enabler to generate the requested page.
Thanks for your feedback @coreykn
I'm also using a cache preloader that crawls URLs to generate all versions cache files (html, gz). This is now impossible in a single run. I have to run it 3 times with different accept-encoding
headers.
Would you be open to setting up some filters/actions to provide the option for developers to generate all files if desired? I think that's still something that might needed.
You're welcome! That was in consideration when making this change:
This will impact anyone using cache warming techniques because a request for each variant to cache will now have to be made.
Always happy to consider any request. Did you have something specific in mind for that type of support? Off the top of my head I'm unsure how it'd be implemented by just adding one or more hooks. My preference is page caching based on the request as the traffic to a website handles this process quite efficiently. If it is desired to simulate that traffic to have pages cached then the type of traffic simulated would have to be adjusted to account for whatever variants want to be cached.
My preference is page caching based on the request as the traffic to a website handles this process quite efficiently.
I also think it's a smart approach but it does not fit all use cases and I believe it would be great to leave developers the freedom to handle cache file generation another way.
On site with a large/huge number of pages, some pages might not be hit often enough to warm up cache. As you probably know, performance is more and more important to SEO. Having an always warmed up cache is sometimes needed.
I'm also generating a Brotli version which leads to 3 potential variants. I wonder if this approach stills brings benefit. Take the following comparison.
On demand scenario:
/page/
-> /page/https-index.html
(cache miss, "slow" because served from PHP/MySQL)/page/
-> /page/https-index.html.gz
(cache miss, "slow" because served from PHP/MySQL)/page/
-> /page/https-index.html.br
(cache miss, "slow" because served from PHP/MySQL)For each variant, we hit the server 3 times and 3 different visitors had a "slow" experience (it's of course a relative "slow", I mean not as fast a cache hit, especially in the WordPress world).
All variants scenario:
/page/
-> /page/https-index.html{.gz,.br}
(cache miss, "slow" because served from WordPress, maybe a bit slower than in the "on-demand" scenario but probably not much slower)In this case, only one visitor hit the server and got a "slow" experience. Subsequent visitors, whatever variant they support will be served a cached page.
What is the more performant way? Hitting 2/3 times the server to generate one cache file or Hitting one time the server to generate 2/3 cache files. That's something that would be interesting to benchmark.
Did you have something specific in mind for that type of support? Off the top of my head I'm unsure how it'd be implemented by just adding one or more hooks
It's not that easy indeed. I tried to tackle it with only one hook and to not change too many lines of code. Here's my first shot on this: https://github.com/keycdn/cache-enabler/compare/master...nlemoine:create-cache-pages
I had to refactor a bit. I split the create_cache_file
method to extract a "pure" file creation method.
A filter example:
add_filter('cache_enabler_cache_files', function($files, $page_contents_raw, $cache_path) {
$files[$cache_path . '/https-index.html'] = $page_contents_raw;
return $files;
}, 10, 3);
Let me know what you think.
Thank you for your detailed feedback, @nlemoine. That is very helpful. You're definitely right, the default handling will not fit all use cases. Let me review your approach in more detail and consider how this may be able to be introduced. If we don't allow all variants to be created at once I believe we'll likely at least add a way for both the compressed and uncompressed versions of a variant to be created as some server configurations expect this. Updates to follow.
Just realized while setting some options in the admin that what you call "variant" is either: webp/gzip/mobile.
In my mind, a "variant" is only a different file encoding: uncompressed, gzip, brotli.
Update cache handling to get the cache file based on the request, mostly looking at the Cache Enabler settings and request headers. This updates the cache handling for both new and potentially cached pages by generating the required cache file through cache keys. This will be much better going forward as it simplifies handling current cache variants and adding additional cache variants in the future. The previous way of handling this would mean many methods and convoluted logic would have to be added on each additional cache variant. As each variant is added that would have only lead to more unnecessary complexity.
The cache keys are obtained in the following ways:
scheme
: Scheme-based caching was introduced in PR #94 and released in version 1.4.0. This was then later updated in PR #98, PR #109, and PR #141. Change this to use what the WordPressis_ssl()
function uses with the added ability to also check theX-Forwarded-Proto
orX-Forwarded-Scheme
request headers.device
: Device-based caching will be introduced in this PR and released in version 1.7.0. This will allow a separate mobile cache to be made based on theUser-Agent
request header. The values checked have been taken from the WordPresswp_is_mobile()
function (#160).webp
: WebP-based caching was released in version 1.0.2. This still checks theAccept
request header, but will now look forimage/webp
instead of justwebp
.compression
: Compression-based caching for Gzip has been supported since the initial release. This still checks theAccept-Encoding
request header forgzip
. This setup will make it easy for other compression algorithms to be introduced in the future.This change means that only the requested cache file is created if it does not exist yet instead of all variants at once. This will reduce unnecessary memory usage, especially if many cache variants are used, and will reduce storing unnecessary cache variants that may never even be requested. (This will impact anyone using cache warming techniques because a request for each variant to cache will now have to be made.) This also ensures the client gets what it has requested instead of falling back to the default HTML file if the requested variant does not exist yet but the default HTML file did. (That was pretty rare edge case as it would only previously occur if a cache variant setting was enabled after cached pages were already generated and the cache was not cleared.)
Getting the request headers has been updated as in some cases
apache_request_headers()
was a function that existed but returned an empty array orfalse
. This lead to some cache variants not being delivered even when accepted by the client. This new way will try to get the request header fromapache_request_headers()
and if not set then fallback to the$_SERVER
variable. If neither are set then an empty string to prevent having to always check if it is set. Only checking$_SERVER
variables was not used as some environments do not provide all of these. This way should get the specific request header values we need across many different configurations if it is available.The cache signature will now include the file name, such as
https-index-webp.html.gz
, instead ofhttps webp gzip
because of this new handling. We can filter this and update it to the old format if preferred. (I do not have a preference, this way just reduces the overhead.) The cache signature will now also include the generated date in HTTP-date format (e.g.D, d M Y H:i:s GMT
) instead ofd.m.Y H:i:s
.Update
Cache_Enabler_Disk::get_settings_file_name()
to parse the$_SERVER['REQUEST_URI']
value and get thePHP_URL_PATH
when in the subdirectory network fallback to prevent a warning from occurring in the followingpreg_grep()
function when the path contained a query string. The?
from the query string would cause a regular expression error if the path had a trailing slash (preceding token is not quantifiable, like in\.(path|to|page|?param=value)
).For a future note, error handling when creating directories and compressing page contents needs to be improved.
wp_die()
for failed cache file directory creation has been removed as it did not work and would have caused an output buffer error instead. As far as I can tell this would have always occurred in all past versions. This would not occur when trying to create the settings file, but it has been removed for this as well as it is not a graceful way to handle this type of error. If Gzip compression failed the page will not be created. These all rarely occur, if they ever do as no issues have been opened around these scenarios, but in the event they do the user should receive additional insight to what is occurring so it can be resolved.Closes #160