Closed aheadley closed 11 years ago
Could use the google sitemap.xml and a cron script that uses php-curl and send the file to /dev/null. Would at least let you cache pages and global ESI blocks.
I've actually already added a script to do that (see warm-cache.sh), this is more about an integrated solution. LIke a button you can click in the Magento admin to start the warm cache process. The part I'm not sure about is doing long running tasks like this would be though.
You could make it add an entry to the cron.php in Magento. Then when you click the cache warming start button it would shecudule the process to run. On top of that you could also allow people to specify how many pages to warm per a cron.php run, maybe times that it is allowed to run, etc...
Initial work started on this in the feature-auto-warm-cache
branch. After giving it some more thought, I'm not sure I want this in Turpentine, might move it to a separate extension since the functionality is useful even if not using Turpentine and there doesn't seem to be a FLOSS cache-warming extension.
This is implemented and will be in RELEASE-0.3.0
I run varnish across multiple servers and warm the cache with a simple script like this:
#!/bin/bash
URL='www.domain.com'
wget --quiet http://$URL/sitemap.xml --no-cache --output-document - | egrep -o "http://$URL[^<]+" | while read line; do
time curl -A 'Cache Warmer' -s -L $line > /dev/null 2>&1
echo $line
done
Heres a simple extension i made to solve this:
@ryan-inverseparadox :)
Is there any way to restrict the cache warmer/crawler to only cache a certain number of products per minute? I am finding that on websites with large catalogs this is bringing down the server due to high load.
I have done a bit of digging around in the extension and have found this file:
/app/code/community/Nexcessnet/Turpentine/etc/config.xml
Specifically line #411:
<cron_expr>0,10,20,30,40,50 * * * *</cron_expr>
and have changed it to:
<cron_expr>720 * * * *</cron_expr>
Is this the correct way to change the crawl schedule for turpentine_crawl_urls to crawl every 12 hours (ie. 720 minutes)? I have had it running for the past 4 hours on a 5000 + product catalog and it seems to ticking along ok, but i'm not sure if I have set it up correctly.
Every 12 hours would be:
0 6,18 * * *
Not sure how to implement this but would be nice to have.