Closed orlitzky closed 7 months ago
# Matches at least html4, html5, xhtml1.0, and xhtml1.1:
#
# https://www.w3.org/QA/2002/04/valid-dtd-list.html
#
# Note that xhtml served as xml can have an <xml ... > tag
# (with question marks next to the brackets) before the doctype.
$html_doctype_regex = '/^\s*(<\?xml.+\?>)?\s*<!DOCTYPE\s+html\s*(PUBLIC\s+.+)?>/i';
And so I don't forget:
if ( $has_html_tag && $has_html_doctype && ! $has_xsl_stylesheet )
Running each of those checks independently and then testing all of the result variables can waste a lot of time. If the output is some very long string of junk with no html doctype, for example, we don't want to search through the whole thing looking for <html
, <xsl:stylesheet
, and <?xml-stylesheet
.
Nesting the if
statements should be all it takes. The doctype regex is likely the fastest because it short circuits (it's anchored at the start of the string) so should probably go first.
It looks like the HTTP Content-Type header is already set during is_cacheable()
, so another option might be to check for text/html
or application/xhtml+xml
in that relatively-short header rather than grepping through the body.
The
is_cacheable()
function returns,Is there a good reason for only allowing html5 and not (say) xhtml-1.0, xhtml-1.1, and html4? It's an easy fix to check for other doctypes but I figured I'd ask before creating a PR in case something goes wrong.