jalmenarez / minify

Automatically exported from code.google.com/p/minify
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Another empty string returned (Regular Expression has PREG_BACKTRACK_LIMIT_ERROR) #46

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What version of Minify are you using?
2.0.1

PHP version?
5.2.5

What steps will reproduce the problem?
1. pcre.backtrack_limit set in the php.ini to the default (100,000 I think).
2. A single line in the HTML needs to be too long (148426 characters in my
case).
3. Minify the HTML.

What is the expected output?
Some HTML.

What do you see instead?
Empty string.

Please provide any additional information below.
I think this is very similar to Issue 41 (too many backtracks). The Regular
Expression in this case is:
$html = preg_replace('/>(\\S[\\s\\S]*?)?\\s+</', ">$1 <", $html);

Although my original html did not have any long lines, the long line was
created by minify.

This time though I think I have 2 solutions, the first is to fix the
expression. Here is my new expression:
$html = preg_replace('/>(\\S[\\s\\S]*?)?\\s+</U', ">$1 <", $html);
Note that the only change is the /U modifier. I tested this new expression
and it looks like it works to me with a 128.9 kB file.

My second solution is as fall-back for failing regular expressions. As
preg_replace returns an empty string ("") when it fails, this should be
caught. Each regular expression should be checked like the following:

$newhtml = preg_replace('/>(\\S[\\s\\S]*?)?\\s+</U', ">$1 <", $html);
if ($newhtml != ""){
    $html = $newhtml;
}

Original issue reported on code.google.com by ecr...@gmail.com on 12 Aug 2008 at 2:13

GoogleCodeExporter commented 9 years ago
I think in general the minifiers need to limit lines lengths if it means using 
newlines instead of spaces in some cases.

I'll try out that code. Something like this might be fast (incredibly simple 
expression + native trim()):

$html = preg_replace_callback('/>([^<]+)</', array('Minify_HTML', '_cbTrim'), 
$html);

function _cbTrim($m) {
  return '>' . trim($m[1]) . ' <';
}

Original comment by mrclay....@gmail.com on 13 Aug 2008 at 2:22

GoogleCodeExporter commented 9 years ago
Can you try Minify_HTML in R168? 

Rather than running 3 patterns on the whole string, there's now one trip across 
the 
string with two simple patterns.
http://code.google.com/p/minify/source/diff?old=163&r=168&format=unidiff&path=
%2Ftrunk%2Flib%2FMinify%2FHTML.php

I'll consider adding the fallback. But then would these bugs get noticed? ;)

Original comment by mrclay....@gmail.com on 13 Aug 2008 at 2:15

GoogleCodeExporter commented 9 years ago
I've got same bug with long html
During tests I've found that count of mathes is 0 and it do not enter callback 
function.

No errors, no notices, no info to help resolve this bug =(

Original comment by swaya...@gmail.com on 15 Jan 2010 at 8:32

GoogleCodeExporter commented 9 years ago
@swaylex: can you tell which is the first preg_replace_callback to fail? Does 
your
input have really long lines, or just really long? Ideally attach a failing 
HTML file.

Original comment by mrclay....@gmail.com on 15 Jan 2010 at 9:56