statamic / cms

The core Laravel CMS Composer package
https://statamic.com
Other
3.89k stars 521 forks source link

Non-existent urls are statically cached #10863

Open stuartcusackie opened 4 days ago

stuartcusackie commented 4 days ago

Bug description

I'm noticing that a lot of bad URLs, such as legacy URLs from WordPress and non-existent image paths, are being statically cached.

For example: https://mysite.ie/app/uploads/2019/04/competitions-at-club-dublin-690x460.jpg https://mysite.ie/media/good_foods.jpg https://mysite.ie/wp-content/uploads/2018/06/Leopardstown_10-2.jpg https://mysite.ie/sitemap.xml.gz https://mysite.ie/swimming/wp-content/dir/erin1.PhP7 https://mysite.ie/index.php/index.php

It's become a small problem recently when I started listening to the UrlInvalidated event to automatically trigger caching as described here: https://github.com/statamic/cms/pull/8902

My site only has about 250 entries but nearly 3500 UrlInvalidated events are caught by my listeners when the static cache is cleared by my static caching rules. It puts a lot of unnecessary strain on the server through queued jobs.

Can non-existent urls somehow be ignored by the static cache? All of the above urls return a 404 error. I assume they are old links from the original site on Google or other indexes.

Thanks.

How to reproduce

Add a listener to handle the UrlInvalidated event, as described here: https://github.com/statamic/cms/pull/8902

Non-existent urls will gather in the static cache over time on a live website.

Logs

No response

Environment

Environment
Laravel Version: 11.25.0
PHP Version: 8.2.18
Composer Version: 2.7.4
Environment: local
Debug Mode: ENABLED
Maintenance Mode: OFF
Timezone: Europe/Dublin
Locale: en

Cache
Config: NOT CACHED
Events: NOT CACHED
Routes: NOT CACHED
Views: CACHED

Drivers
Broadcasting: log
Cache: file
Database: mysql
Logs: single
Mail: smtp
Queue: sync
Session: file

Livewire
Livewire: v3.5.8

Statamic
Addons: 7
Sites: 1
Stache Watcher: Enabled
Static Caching: Disabled
Version: 5.27.0 PRO

Statamic Addons
jonassiewertsen/statamic-live-search: 2.1.1
jonassiewertsen/statamic-livewire: 3.8.0
rias/statamic-redirect: 3.8.1
spatie/statamic-responsive-images: 5.0.1
statamic/seo-pro: 6.1.2
stuartcusackie/statamic-cache-requester: 1.2.1
thoughtco/statamic-cache-tracker: 0.9.2

Installation

Fresh statamic/statamic site via CLI

Additional details

No response

duncanmcclean commented 3 days ago

Are you able to provide the full output of php please support:details?

stuartcusackie commented 17 hours ago

Sorry, updated above.

jasonvarga commented 3 hours ago

We should be able to passing along to the UrlInvalidated event whether it was a 404 or not. Then you can avoid refetching those URLs.

stuartcusackie commented 2 hours ago

@jasonvarga That would be perfect. Thanks!

stuartcusackie commented 2 hours ago

Actually... I'm just wondering if this would still cause unnecessary processing. The UrlInvalidated event would still be fired thousands of times, and so would my listener, even though it would perform no actions. It seems to me that these urls shouldn't be cached in the first place.

Maybe it's fine. Just a thought.

jasonvarga commented 2 hours ago

They intentionally get cached since #10294.

If your 404 page is heavy - it might be because of a nav or who knows what else - you could easily make a site struggle by hitting different 404 pages.