wilr / silverstripe-googlesitemaps

Google Sitemaps module for the SilverStripe CMS
BSD 3-Clause "New" or "Revised" License
74 stars 95 forks source link

Add caching to optimize calls to ->canIncludeInGoogleSitemap() #180

Open patricknelson opened 3 years ago

patricknelson commented 3 years ago

This isn't a typical scenario, however: I manage a site with a little over 4.3k pages. I found that with the proper caching applied, you can dramatically reduce server-side load. For example, some areas that could use improvement:

The performance boost is particularly noticeable if you happen to have any logic placed inside of ->canIncludeInGoogleSitemap() on your pages. Even if not, caching these calls can help a lot due to the overhead of going through the call stack to hit the extension and then the overhead of the call in the extension version of that method itself. The extension's own ->canIncludeInGoogleSitemap() then calls ->hasPublishedParent() recursively down the page hierarchy, which is an expensive operation when done 4.3k times (or maybe 500 times per page, for example), particularly if the site content has a lot of organizational depth.

On my site, after some quick hacking, I found that I could reduce load time from ~4s 100% of the time down to only ~4s on the first hit, then ~300ms on subsequent loads as long no SiteTree instances listed in the current sitetree.xml page have been modified (since you want to spoil the cache if there were any modifications that could have impacted the result of that function call). This would be especially beneficial in a distributed cluster where the cache layer is utilizing a distributed k/v cache like Memcached or Redis or whatever.

Two main approaches:

Just some ideas. I would have submitted a PR, however my site is still stuck on SilverStripe 3.x so there's no point. I'm also just going to leave my modified version of the extension (mentioned above in my testing) out of production code since we have a pretty performant cluster, send proper HTTP cache headers (via an extension to GoogleSitemapController) have a CDN that will force clients to use the cached version (in case a bot sees this weakness and uses it to pound our server, for example).

wilr commented 3 years ago

@patricknelson happy to see any improvements you've got. If you want to email anything through, I can adapt for v4.

Even though caching logic can be quite site specific, for this usage 95% of the time we can rely on the LastEdited timestamp on the record. Would need some logical API's to override whatever the system does out of the box but seems an easy improvement.

Not sure if you're running an old version of this module but everything is paginated to max X records per view or prevent overwhelming the server (https://github.com/wilr/silverstripe-googlesitemaps/blob/main/src/GoogleSitemap.php#L51). If your can* check is slow then set that accordingly to a smaller figure (i.e 100) to reduce load.