szepeviktor / w3-total-cache-fixed

A community driven build of W3 Total Cache. The aim is to continuously incorporate fixes, improvements, and enhancements over the official WordPress release of W3 Total Cache.
https://github.com/szepeviktor/w3-total-cache-fixed/releases
MIT License
237 stars 47 forks source link

BWP Sitemaps Not Clearing Cache on New Posts/Updates #187

Closed casbboy closed 7 years ago

casbboy commented 7 years ago

Weird issue. Just noticed that our BWP Sitemaps are not getting cleared from cache when we add a new article. Not sure why fix-w3tc is not clearing the cache on BWP with new updates. I've been having to manually clear disk cache for every new post.

Any ideas?

Cheers Ryan

nigrosimone commented 7 years ago

Please post here your w3tc-config (uses the export setting feature in the admin dashboard).

nigrosimone commented 7 years ago

And.. some info: 1) wordpress version (4.6? ) ? 2) web service tecnology (apache... ) ? 3) fix-w3tc version (latest? ) ?

amiga-500 commented 7 years ago

Hi Ryan ( @casbboy )...Just installed BWP (Better WordPress Google XML) Sitemaps plugin and played around. Question:

In my tests i too noticed the sitemap cache (handled by BWP's Sitemap Cache feature) is never regenerated.

Interestingly, when i disable fix-w3tc BWP Sitemaps still does not clear the cache on new article posts.

If that is the case it appears it may be a bug in BWP and not an issue with fix-w3tc. Maybe it stopped working because Wordpress updated to v4.7 yesterday?

Of course, this assumes BWP actually regenerates a new sitemap upon new posts and doesn't specifically reply on its cache expire time to run out first. I am still trying to figure that out.

casbboy commented 7 years ago

Hello!

Let me get you that info. I had the issue with 4.6, and discovered with that version. I updated to 4.7 today. I would go to the Dashboard and manually click Flush Disk Cache.

I do not have BWP Sitemap Cache checked.

  1. wordpress version( 4.6, now 4.7)
  2. Web service (nginx, disk cache)
  3. fix-w3tc (.9.4.5)

I'm not sure how to export the settings. "Performance >> ?"

Cheers! Ryan

amiga-500 commented 7 years ago

Export is located under fix-w3tc's Performance > General Settings > Import/Export Settings (located at the very bottom of the page). Just click the Download button

casbboy commented 7 years ago

Here you go :)

amiga-500 commented 7 years ago

Located under: Performance > Page Cache > Purge Policy:Page Cache there is a field labelled: Purge Sitemaps

In that field replace its present content with this:

[a-z0-9_\-]*sitemap[a-z0-9_\-]*\.(xml|xsl|html?)(\.gz)?|post\.xml|site\.xml|page\.xml|taxonomy_category\.xml

Let me know if that solves your problem.

casbboy commented 7 years ago

Thanks! I'm testing right now....

casbboy commented 7 years ago

Worked on post.xml sitemap! But it didn't on this: sitemapindex.xml

amiga-500 commented 7 years ago

Hi @casbboy thats strange about sitemapindex.xml since the line above would have covered it -- even my tests was able to purge it correctly. But nonetheless, lets make things very specific then. So replace the entire line (that i gave you above) with the following instead:

(sitemapindex|post|site|page|taxonomy_category)(_part[0-9]+)?\.xml
casbboy commented 7 years ago

Ok. so replace the entire line with just what you added? I noticed post.xml was gone. Would we need to a wildcard for the post.xml, since that becomes post_part2.xml when BWP starts breaking the posts sitemap into multiple pages.

amiga-500 commented 7 years ago

Oh sorry my mistake.....i just noticed i did forget about post.xml...let me rewrite things for you. I will post a new reply.

amiga-500 commented 7 years ago

Hi @casbboy Ok you can try this (and yes you replace the entire line of Purge Sitemaps with it):

(sitemapindex|post|site|page|taxonomy_category)(_part[0-9]+)?\.xml

Notice i included "post" and i also included the potential for "_part2" (or _part3,... etc)

casbboy commented 7 years ago

Sweet! Going to try now.

casbboy commented 7 years ago

So weird, I changed the rules and they once again worked for the post.xml, but the sitemapindex.xml was left cached. Not sure why that is happening, as the line is right.

amiga-500 commented 7 years ago

Sounds like it could be related to either you having a server-side reverse proxy like Varnish existing (e.g. for mine i need to tell varnish to purge or i wont see the latest sitemapindex.xml), or it could possibly be a browser-side cache thing which is temporarily showing you the old results despite fix-w3tc deleting the cache from disk.

Inside sitemapindex.xml the only thing i see changing are just the dates for the other xml files -- more specifically, post.xml in my case.

If you do have a varnish server lurking you'd want to enable Performance > General Settings > Reverse Proxy > Enable varnish cache purging and provide the ip address (possibly: 127.0.0.1).

casbboy commented 7 years ago

Yeah, it will show the update time on the sitemap for post.xml updated, telling Google to index it again. I don't use Varnish, and my browser is set to no-cache. I can keep refreshing and it shows the same. But if I clear all disk cache, immediately shows the file updated in browser.

So strange. It's definitely not clearing it.

amiga-500 commented 7 years ago

Strange indeed. I am even using your same settings (and BWP) to simulate and its all working fine (sitemapindex.xml is being deleted fine). The only thing i haven't tested it against is by having CloudFlare enabled which you do and i don't. I will go create a CloudFlare account so at least config wise we will be identical, and maybe i will get lucky and the sitemapindex.xml will now not delete making it then easier for me to know what is wrong. Not sure why it wouldn't delete still in this context but hey might as well try since everything else is working fine so i need something.

casbboy commented 7 years ago

I actually have Cloudflare set to bypass *.xml files to ensure that isn't the case. :(

It has to be something local. Grrrrrrrrrr

amiga-500 commented 7 years ago

Thanks for the info. I will keep roaming for some answers. Odd that the other files get removed except for sitemapindex.xml ... so close 😿 ... at least we're almost there!

In the mean time, if you learn anything new about a potential cause or resolve it please let me know asap.

charlesLF commented 7 years ago

@casbboy: I am assuming you are using "Disk Cache: Enhanced", correct? Have you tried looking in your cache directory (usually in /wp-content/cache/page_enhanced) after you publish a new post?

Is there a file/folder related to sitemapindex.xml anywhere there?

casbboy commented 7 years ago

Let me check, but yes, I am using that. For some reason the memcache option kept crashing my memcache.

casbboy commented 7 years ago

Ok, when I go into that folder, i see the sitemapindex.xml folder being created when I publish a new post, but no post.xml folder is being created upon posting. Then, when I go to the post.xml sitemap, I then see "post.xml" folder being created. But only after I visit the sitemap, while sitemapindex.xml is there before needing to be opened.

casbboy commented 7 years ago

Another weird error worth noticing. Is that before clearing cache manually, my browser reads the sitemapindex.xml as bad format (like the xml header didn't come through). But when I clear disk cache and open again the format is considered correct.

casbboy commented 7 years ago

Another thing I'm seeing is that it immediately pings the search engines with the sitemapindex.xml on publishing. Not sure what issue this might cause.

amiga-500 commented 7 years ago

Hi @casbboy

Not sure if you will be around. Here are some info for ya:

Another thing I'm seeing is that it immediately pings the search engines with the sitemapindex.xml on publishing. Not sure what issue this might cause.

Under BWP Sitemaps > XML Sitemaps (tab) > Ping Search Engines you probably have those checked on. When you make a new post they will auto-ping.

Another weird error worth noticing. Is that before clearing cache manually, my browser reads the sitemapindex.xml as bad format (like the xml header didn't come through). But when I clear disk cache and open again the format is considered correct.

Go to: Performance > Page Cache > Advanced > Handle XML mime Type (checkmark/enable this on). What you describe seems to indicate that the file is being sent back with the incorrect mime type. When you clear disk cache and re-open the file it's fine because it is recaching it and returning it back as an xml type. But once cached it is returning it back as html. I fixed this issue on Apache servers a long time ago. I noticed the w3tc author had this already available for nginx. What left me scratching my head was he purposely prevents using this feature for nginx under a certain condition -- its been a while so i forget what that condition was.

Also just make sure you dont have BMP Sitemaps > Advanced Options > Compress Sitemaps checked on (so disable it) since this will end up gzipping the contents twice since w3tc already gzip's and result in a XML Parsing Error: not well-formed error message (when using Firefox..not sure what it shows in other browsers).

Ok, when I go into that folder, i see the sitemapindex.xml folder being created when I publish a new post, but no post.xml folder is being created upon posting. Then, when I go to the post.xml sitemap, I then see "post.xml" folder being created. But only after I visit the sitemap, while sitemapindex.xml is there before needing to be opened.

What is strange is why the sitemapindex.xml is being generated automatically when you publish. On my side nothing gets generated until someone visits the sitemap. It seems someone is attempting to access it right away on publish. It could be one of the ping services immediately pinging back the sitemapindex.xml on publish since you seem to have the pinging service enabled. That being said it seems sitemapindex.xml, ping.xml, and any other related xml files are indeed getting removed on your side correctly when published. It's just that the sitemap cache is being generated in the wrong order:

When the sitemapindex.xml file is generated so quickly it is done before post.xml is created (or any of the other xml files held inside sitemapindex.xml), this results in sitemapindex.xml showing the old timestamps for the other xml files. In short, it's like we want to auto-generate the other xml files upon a new publish and then generate sitemapindex.xml last (or not generate sitemapindex.xml automatically at all, relying on someone hitting the file to generate while the other xml files are auto-gen). I will have to play around to see what we can do.

Update

I was originally thinking we might have to shift your sitemap caching back to BWP but it's funny when checking BWP's settings i notice they have no feature to regenerate the sitemap upon new posts (when it handles caching). It can only regenerate after a timed interval. huh? Seems like an oversight by the author. Maybe this is why you are having fix-w3tc cache your sitemap.

Playing with BWP s'more and i find the author's logic a bit strange. Even if you remove w3tc out of the equation the sitemapindex.xml file can still be updated before the post.xml file resulting in one thinking the sitemapindex.xml is still old. It's because he uses a log file stored in the dbase indicating the last modified date of post.xml. So if one uses a cache (including its own) you are going to have date mismatches. It's not the approach i would have gone.

charlesLF commented 7 years ago

I have to agree with @amiga-500's detailed analysis.

@casbboy: One thing you can do to help us narrow it down:

  1. Go to the cache directory /wp-content/cache/page_enhanced
  2. Check the creation date of the sitemapindex.xml folder
  3. Publish a new post
  4. Re-check the creation date of the sitemapindex.xml folder
  5. Is the date newer? If so, then w3tc is correctly clearing the cache, and the problem may well be from how BWP pings the sitemap when a new post is published.

Let us know what you find out :)

casbboy commented 7 years ago

First, if I go here: Performance > Page Cache > Advanced > Handle XML mime Type

It won't let me add a checkmark, it looks disabled.

I added "sitemapindex.xml" to Additional Pages in page cache.

Second, I checked and folder existed:

sitemapindex.xml 20161208 19:32

I added a post and checked again

sitemapindex.xml 20161208 19:45

So it updated on post. Then I opened the page and the formatting was right. So everything looked good, but it showed the last modified date for post.xml as 16:44

I manually cleared disk cache and opened sitemapindex.xml again and it showed modified date for post.xml as 19:45

So still something a bit off. But closer!

Cheers Ryan

casbboy commented 7 years ago

Actually, made another discovery.

So even with W3TC ignoring cacheing sitemapindex.xml, it still does not update the Modified Time on the file with new post, so not a cache issue.

The problem seems BWP itself, as it does not regenerate the sitemapindex.xml file in regards to modified time until a listed sitemap is once again opened.

So, it will not update until I open a purged post.xml. When I open that, it then tells sitemapindex.xml to rebuild on its most recent modified/added post.

Cheers! Ryan

amiga-500 commented 7 years ago

Precisely...that is what i discovered this morning. Glad you made that discovery too! It's a design flaw within BWP. I ended up attempting to contact the author earlier this morning when i realized it was a problem with his code. He hasn't replied yet.