wpsharks / comet-cache

An advanced WordPress® caching plugin inspired by simplicity.
https://cometcache.com
GNU General Public License v3.0
77 stars 18 forks source link

HTML Compressor + antispambot() with umlauts results in cached blank page #871

Open code-flow opened 7 years ago

code-flow commented 7 years ago

Hey guys, I hope you're doing well today. I've interesting bug report for you.

It seems that the latest version (I'm using CometCache Pro version 161227) has problems when a website uses WordPress' internal antispambot() function and when an e-mail address has umlauts in it.

Here's an example code you can use in your themes functions.php:

add_action('init', function(){
    add_shortcode('testmail', function($atts, $content, $name){
        return sprintf('<a href="mailto:%s">Testmail</a>', antispambot('info@hägar.de'));
    });
});

Then enter the shortcode [testmail] into any blogpost. You will see that CometCache will return an empty page. Unfortunately this empty page gets cached.

The problem seems to be in the tokenizeGlobalExclusions() function of the \WebSharks\HtmlCompressor\Core class. And there in the preg_replace_callback() function that is using the "u" modifier since you updated the plugin last time.

The preg_replace_callback() returns an empty string. I'm not sure if the lite version has the same issue.

Hope this helps making CometCache even more better! ;)

Greetings

raamdev commented 7 years ago

@code-flow Thank you for the report! :-) However, I haven't been able to reproduce this issue following your instructions above. I'm using Comet Cache Pro v161227 + HTML Compressor, on WP v4.7.2 running the Twenty Fifteen theme on PHP v7.0.12 and Nginx v1.11.3.

The page with the shortcode gets cached properly and subsequent visits to that cached page do not show up as blank, but rather the cached page gets loaded as expected:

2017-01-26_21-37-15

Any other ideas how I can reproduce this problem?

code-flow commented 7 years ago

Hey @raamdev. You're welcome. Thanks for having a look into that. I have this issue on a customer website as well as on my local machine.

I've tested again with the following:

Same issue again. Blank site. No PHP-Errors. The only message I get on this blank site when I look at the sourcecode is: <!-- Comet Cache HTML Compressor took 0.00088 seconds (overall). -->

raamdev commented 7 years ago

@code-flow Can you confirm that disabling the HTML Compressor resolves the issue? I.e., that this seems specifically related to the HTML Compressor?

code-flow commented 7 years ago

Yes, when HTML-Compressor is off, everything works just fine.

code-flow commented 7 years ago

I'm not sure if the following helps:

When using antispambot('info@hägar.de', 0) // --> blank page the output of the above shortcode is: <a href="mailto:in&#102;&#111;&#64;hä&#103;&#97;&#114;.&#100;e">Testmail</a> And I get a blank page.

The following works (hex-encoded). However the E-Mail would then be sent to "info@hägar.de". antispambot('info@hägar.de', 1) // --> works The output: <a href="mailto:%69n%66&#111;%40%68�&#164;g%61%72%2ed%65">Testmail</a>

I guess that the umlaut is the problem. Unfortunately I don't know why and if it's a problem of the compressor or the antispambot() function.

Were you be able to reproduce it?

raamdev commented 7 years ago

I was able to reproduce this issue, yes. It looks like the blank page issue is not consistent—it only occurs about 50% of the time. I'm guessing there's some character that antispambot() is sometimes using (the specific characters change and which part of the email address that gets encoded also changes) that the HTML Compressor is choking on.

When I got the blank cached page, here's what the cache file contained:

a:7:{i:0;s:12:"HTTP/1.1 200";i:1;s:38:"Expires: Wed, 11 Jan 1984 05:00:00 GMT";i:2;s:51:"Cache-Control: no-cache, must-revalidate, max-age=0";i:3;s:16:"Pragma: no-cache";i:4;s:38:"Content-Type: text/html; charset=UTF-8";i:5;s:69:"Link: ; rel="https://api.w.org/"";i:6;s:57:"Link: ; rel=shortlink";} 

So this definitely looks like an HTML Compressor bug. Thanks so much @code-flow for reporting this! We'll work on a bug fix.

jaswrks commented 7 years ago

My investigation shows that antispambot() uses zeroise() and at times it produces an invalid UTF-8 sequence; i.e., that function is buggy. Sometimes the output it generates is fine, other times it generates an invalid UTF-8 sequence. As a result, the invalid UTF-8 sequence is passed to the HTML Compressor and run through preg_replace(), which chokes on the invalid UTF-8 sequence and returns an empty string.

To avoid this pitfall, the HTML Compressor now checks for invalid UTF-8 before it begins and will refuse to compress an HTML document that contains an invalid UTF-8 sequence. Not compressing is better than failing with the empty document in a case such as this.

Example Output in An Invalid UTF-8 Scenario

<!-- Comet Cache HTML Compressor did not run; HTML contains invalid UTF-8. -->

<!-- *´¨)
     ¸.•´¸.•*´¨) ¸.•*¨)
     (¸.•´ (¸.•` ¤ Comet Cache is Fully Functional ¤ ´¨) -->

<!-- Cache File User Token:         1 -->
<!-- Cache File Version Salt:       n/a -->

<!-- Cache File URL:                http://dev.jaswrks.com/test-page/ -->
<!-- Cache File Path:               /cache/comet-cache/cache/http/dev-jaswrks-com/test-page.u/1.html -->

<!-- Cache File Generated Via:      HTTP request -->
<!-- Cache File Generated On:       Apr 20th, 2017 @ 6:52 am UTC -->
<!-- Cache File Generated In:       0.15189 seconds -->

<!-- Cache File Expires On:         Apr 27th, 2017 @ 6:52 am UTC -->
<!-- Cache File Auto-Rebuild On:    Apr 27th, 2017 @ 6:52 am UTC -->

<!-- Loaded via Cache On:    Apr 20th, 2017 @ 6:53 am UTC -->
<!-- Loaded via Cache In:    0.02980 seconds -->
code-flow commented 7 years ago

Has this issue been merged to the pro version already? We've updated today to version 170220 but the problem still exists :-(

raamdev commented 7 years ago

@code-flow This issue has been resolved in the dev-branch and will go out with the next official release. We're hoping to get the next release out within a week or two. This GitHub issue will be updated once the release occurs.

Note: If you're interested in testing a beta release of Comet Cache before the next version comes out, please sign-up to be a beta tester here or see Comet Cache → Plugin Updater → Beta Testers to automatically receive Release Candidate updates.