poweredcache / powered-cache

The most powerful caching and performance suite for WordPress. https://wordpress.org/plugins/powered-cache/
https://poweredcache.com/
GNU General Public License v2.0
42 stars 3 forks source link

Related URL selection for cache preloading should be more intelligent #130

Closed AndisGrossteins closed 8 months ago

AndisGrossteins commented 8 months ago

I was troubleshooting a caching issue and decided to turn on logging using the POWERED_CACHE_LOG_FILE constant. That did help solve the original issue but I discovered a more troubling behaviour.

A snippet from the log file after editing a non-public post type – customize_changeset (post ID 58243 with GUID 2e290c6c-6b0b-4902-8cf5-45b6038fa2d9 as post slug) which is updated via theme customizer:

14:33:34 64588 94.237.0.0 Call: delete_page_cache
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/author/andis/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/author/andis/feed/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/amp/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/feed/rdf/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/feed/rss/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/feed/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/feed/atom/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/comments/feed/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/feed/
14:33:34 64588 94.237.0.0 Call: delete_page_cache
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/author/andis/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/author/andis/feed/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/amp/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/feed/rdf/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/feed/rss/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/feed/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/feed/atom/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/comments/feed/
14:33:34 64588 94.237.0.0 delete_page_cache - URL: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/feed/
...
14:33:34 64588 94.237.0.0 Async cache purge has been completed!
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/author/andis/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/author/andis/feed/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/amp/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/feed/rdf/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/feed/rss/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/feed/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/feed/atom/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/comments/feed/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/2e290c6c-6b0b-4902-8cf5-45b6038fa2d9/feed/
14:33:34 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/
... (snip) ...
14:33:36 88845 94.237.0.0 Async cache purge has been completed!
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/author/andis/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/author/andis/feed/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/?p=58243
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/?p=58243
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/?p=58243feed/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/?p=58243amp/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/feed/rdf/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/feed/rss/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/feed/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/feed/atom/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/comments/feed/
14:33:36 86971 83.177.0.0 Pushed to preload queue [add_purged_urls_to_preload_queue]: https://plume.dev/?p=58243/feed/

First, the "feed/" and "amp/" are appended to the query string without regard to the URL structure.

Second, there seem to be no checks to make sure the post type is public/viewable using is_post_type_viewable($postType) or if the post is a revision (wp_is_post_revision($post_id)) or an autosave (wp_is_post_autosave($post_id)) or if post comments are open (comments_open($post)).

Third, there should be filters for disabling cache preloading of feeds, author archive, author feed, and comment feeds. I know that there is the powered_cache_post_related_urls filter but it would be better to add filters and corresponding (advanced) options to control what related URLs are added to the cache preload queue.

mustafauysal commented 8 months ago

Hi @AndisGrossteins,

First, the "feed/" and "amp/" are appended to the query string without regard to the URL structure.

Thanks for bringing this to my attention. Well, these logs don't exactly mean to delete the URL; it's like a lookup; if the cache doesn't exist, it doesn't delete anything.

Second, there seem to be no checks to make sure the post type is public/viewable using is_post_type_viewable($postType) or if the post is a revision (wp_is_post_revision($post_id)) or an autosave (wp_is_post_autosave($post_id)) or if post comments are open (comments_open($post)).

It makes sense, but we should consider adding an exception for customize_changeset. Since customizer changes can impact the entire site, it's crucial to flush the existing cache to ensure that these changes are effectively implemented. Would you mind sending a PR for this?

Third, there should be filters for disabling cache preloading of feeds, author archive, author feed, and comment feeds. I know that there is the powered_cache_post_related_urls filter but it would be better to add filters and corresponding (advanced) options to control what related URLs are added to the cache preload queue.

You can use populate_preload_queue_urls and powered_cache_advanced_cache_purge_urls (which is essentially an alias for powered_cache_post_related_urls) for this purpose. Perhaps we could introduce an additional option in the Preload settings to populate more URLs. However, I'm not inclined to add options for controlling every aspect of EP_MASK URLs.

AndisGrossteins commented 8 months ago

I'm sorry if my tone was a bit rude or too direct.

Thanks for bringing this to my attention. Well, these logs don't exactly mean to delete the URL; it's like a lookup; if the cache doesn't exist, it doesn't delete anything.

I see that the URL paths are used to look up the file paths but even non-existent URLs are re-added to the preload queue and then requested from the server leading to 404s. It seems that there are also no checks to make sure the URLs correspond to existing content after a complete cache purge which leads to unnecessary requests for content that has been changed or removed entirely.

IMO, the logic of the purging and preloading needs a different approach. That would require some major architectural changes which I don't feel in a position to decide.

Some ideas for purging might be lifted from inspired by the cloudflare/Cloudflare-WordPress plugin. I'd like to point out their approach to purging cache on post update/trash/publish using a single hook – transition_post_status action hook. Also, the filter hooks for controlling which action hooks trigger the plugin actions seem useful.

It makes sense, but we should consider adding an exception for customize_changeset. Since customizer changes can impact the entire site, it's crucial to flush the existing cache to ensure that these changes are effectively implemented. Would you mind sending a PR for this?

Flushing the entire cache after the theme customization update should be handled using the customize_save_after hook. Adding exceptions just for the customize_changeset post type would not be enough. We should rather use the is_post_type_viewable() function to check if the post type is public and add a filter to override that decision per post type if somebody needs it.

I can implement both changes – the complete purge after theme customization and CPT visibility checks, and send a PR but that will be no earlier than next week.

You can use populate_preload_queue_urls and powered_cache_advanced_cache_purge_urls (which is essentially an alias for powered_cache_post_related_urls) for this purpose. Perhaps we could introduce an additional option in the Preload settings to populate more URLs. However, I'm not inclined to add options for controlling every aspect of EP_MASK URLs.

That is exactly what I ended up doing suing preg_grep() to catch the mentioned cases. I'll monitor the logs to see if any additions are needed:

/**
 * Filter the Powered Cache preload URLs to remove author archives, feeds, and comment feeds
 *
 * @param array $related_urls
 * @param int|null $post_id
 *
 * @return array|false
 */
function pluu_powered_cache_post_related_urls( $related_urls, $post_id = null) {
    $related_urls = preg_grep('#/author/.+/$#', $related_urls, PREG_GREP_INVERT);
    $related_urls = preg_grep('#amp/$#', $related_urls, PREG_GREP_INVERT);
    return preg_grep('#/(comments/)?feed/(rss|rdf|atom)?#', $related_urls, PREG_GREP_INVERT);
}
add_filter('populate_preload_queue_urls', 'pluu_powered_cache_post_related_urls', 20, 1);
add_filter('powered_cache_post_related_urls', 'pluu_powered_cache_post_related_urls', 20, 1);
add_filter('powered_cache_advanced_cache_purge_urls', 'pluu_powered_cache_post_related_urls', 20, 2);
mustafauysal commented 8 months ago

Certainly, you have a valid point. It seems like the preloader could benefit from some refactoring. I'll make sure to address this in the upcoming version. Thank you for your feedback 👍🏼

mustafauysal commented 8 months ago

@AndisGrossteins

Feel free to ping me if I missed something.

mustafauysal commented 8 months ago

A single filter will be enough to control preloader behavior in version 3.4

add_filter( 'powered_cache_preload_add_url_to_queue', function ( $true, $url ) {
    if ( false !== stripos( $url, '/feed' ) ) {
        return false;
    }

    return $true;
}, 10, 2 );
mustafauysal commented 8 months ago

These problems were fixed in version 3.4. FYI @AndisGrossteins