Closed nlemoine closed 2 years ago
Hi, thanks for the good bug report. :) The problem is *?
here, I already had similar problems with another library here (https://github.com/voku/anti-xss/commit/f8a2eef324c879ef71f56964263f352ff71e2403) so I added the same fix here, what do you think about it? https://github.com/voku/HtmlMin/commit/6d26d4fbe7090c6c6bee12279f14e9c2e6b298c3#diff-379408ffd163375efe70c39a5decbe4b7abfbea017cabe052a8b404ed969bfd4
https://regex101.com/r/7EBtDM/1
*?
matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
*
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Thanks for your fast feedback! 🙂 Looks like a smart solution, works for me!
Will you release a tag for this fix? Just so I can adjust my constraint setting in my composer.json
file.
Version 4.4.9 with upstream fixes from "voku/simple_html_dom" are released.
Thanks a lot!
On a legacy project, I'm facing issues with some client who (I don't really know how) has inserted base64 encoded images.
Further more, those images aren't optimized at all (uncompressed PNG for photos...), resulting in huge base64 strings (multiple images up to 2mo):
What is this feature about (expected vs actual behaviour)?
When minifying such HTML, I end up with an empty string because this replacement fails: https://github.com/voku/HtmlMin/blob/4f700584abd70b308b7d06b8e4cfcc31711faaf9/src/voku/helper/HtmlMin.php#L1375-L1381
Because it exceeds the
pcre.backtrack-limit
and returns aPREG_BACKTRACK_LIMIT_ERROR
.I'm fully aware this is all wrong: wrong image format for photo and more importantly, large images sources should be inserted as URLs.
However, technically speaking, all this crap is still valid HTML (the page displays when unminified) and I think such cases (even if it's kind of an edge case) should not fail.
Thus, I wonder if a sanity check should be inserted somewhere to prevent the script from failing and returning an empty string? Maybe catch errors on all
preg_
calls and return the unminified HTML if an error occured? Or check the string length against thepcre.backtrack-limit
value?Wrapping
preg_
calls the Guzzlejson_decode
way could be a nice solution (because it would not only handlebacktrack-limit
errors but all kinds ofpreg_
errors)? https://github.com/guzzle/guzzle/blob/74ca2cb463a7a99a0b99f195ca809cc4ba6c3147/src/Utils.php#L281-L301What do you think?
How can I reproduce it?
Try to minify some HTML with length that exceeds the
pcre.backtrack-limit
.Does it take minutes, hours or days to fix?
I think a couple hours should be enough, I can submit a PR but would like your feedback on the different proposed solutions before working something out.