Open patricknelson opened 3 years ago
@patricknelson happy to see any improvements you've got. If you want to email anything through, I can adapt for v4.
Even though caching logic can be quite site specific, for this usage 95% of the time we can rely on the LastEdited
timestamp on the record. Would need some logical API's to override whatever the system does out of the box but seems an easy improvement.
Not sure if you're running an old version of this module but everything is paginated to max X records per view or prevent overwhelming the server (https://github.com/wilr/silverstripe-googlesitemaps/blob/main/src/GoogleSitemap.php#L51). If your can* check is slow then set that accordingly to a smaller figure (i.e 100) to reduce load.
This isn't a typical scenario, however: I manage a site with a little over 4.3k pages. I found that with the proper caching applied, you can dramatically reduce server-side load. For example, some areas that could use improvement:
->canIncludeInGoogleSitemap()
somehow<% cached ... %>
blocks for partial template caching to save overhead in generating XMLsitemap.xml
is popular)The performance boost is particularly noticeable if you happen to have any logic placed inside of
->canIncludeInGoogleSitemap()
on your pages. Even if not, caching these calls can help a lot due to the overhead of going through the call stack to hit the extension and then the overhead of the call in the extension version of that method itself. The extension's own->canIncludeInGoogleSitemap()
then calls->hasPublishedParent()
recursively down the page hierarchy, which is an expensive operation when done 4.3k times (or maybe 500 times per page, for example), particularly if the site content has a lot of organizational depth.On my site, after some quick hacking, I found that I could reduce load time from ~4s 100% of the time down to only ~4s on the first hit, then ~300ms on subsequent loads as long no
SiteTree
instances listed in the currentsitetree.xml
page have been modified (since you want to spoil the cache if there were any modifications that could have impacted the result of that function call). This would be especially beneficial in a distributed cluster where the cache layer is utilizing a distributed k/v cache likeMemcached
orRedis
or whatever.Two main approaches:
->column(...)
on the list to get certain fields to spoil/vary the caching.->canIncludeinGoogleSitemap()
on->write()
in a special column on the rootSiteTree
object itself instead of calling it all the time. The benefit here is you get extremely fast performance by just filtering on the result of that column. The downside is that you lose the benefit of any complicated side-effects from->canIncludeGoogleSitemap()
that don't rely purely on the state of thatDataObject
(e.g. say it relies on the state of some other unrelated object or a global value). So, doing this would be a breaking change, I'd immagine.Just some ideas. I would have submitted a PR, however my site is still stuck on SilverStripe 3.x so there's no point. I'm also just going to leave my modified version of the extension (mentioned above in my testing) out of production code since we have a pretty performant cluster, send proper HTTP cache headers (via an extension to
GoogleSitemapController
) have a CDN that will force clients to use the cached version (in case a bot sees this weakness and uses it to pound our server, for example).