Cache additional (X)HTML doctypes

keycdn / cache-enabler

A lightweight caching plugin for WordPress that makes your website faster by generating static HTML files.

https://wordpress.org/plugins/cache-enabler/

123 stars 46 forks source link

Cache additional (X)HTML doctypes #324

Closed orlitzky closed 7 months ago

orlitzky commented 1 year ago

The is_cacheable() function returns,

if ( $has_html_tag && $has_html5_doctype && ! $has_xsl_stylesheet ) {
  return true;
}

return false;

Is there a good reason for only allowing html5 and not (say) xhtml-1.0, xhtml-1.1, and html4? It's an easy fix to check for other doctypes but I figured I'd ask before creating a PR in case something goes wrong.

orlitzky commented 1 year ago

# Matches at least html4, html5, xhtml1.0, and xhtml1.1:                        
#                                                                               
#   https://www.w3.org/QA/2002/04/valid-dtd-list.html                           
#                                                                               
# Note that xhtml served as xml can have an <xml ... > tag                      
# (with question marks next to the brackets) before the doctype.                
$html_doctype_regex = '/^\s*(<\?xml.+\?>)?\s*<!DOCTYPE\s+html\s*(PUBLIC\s+.+)?>/i';

orlitzky commented 1 year ago

And so I don't forget:

if ( $has_html_tag && $has_html_doctype && ! $has_xsl_stylesheet )

Running each of those checks independently and then testing all of the result variables can waste a lot of time. If the output is some very long string of junk with no html doctype, for example, we don't want to search through the whole thing looking for <html, <xsl:stylesheet, and <?xml-stylesheet.

Nesting the if statements should be all it takes. The doctype regex is likely the fastest because it short circuits (it's anchored at the start of the string) so should probably go first.

orlitzky commented 1 year ago

It looks like the HTTP Content-Type header is already set during is_cacheable(), so another option might be to check for text/html or application/xhtml+xml in that relatively-short header rather than grepping through the body.