kesar / HTMLawed

a highly customizable PHP script to sanitize / make (X)HTML secure against XSS attacks, so users can edit HTML without risk of your site getting compromised by evildoers.
GNU General Public License v2.0
36 stars 17 forks source link

Regex is slow #30

Open agulabon11 opened 3 weeks ago

agulabon11 commented 3 weeks ago

The preg_match() in line 452 can be slow in PHP 7.4 (not in 8.1 afaict).

Sometimes it takes seconds to minutes to parse.

if (!preg_match('`^(/?)([a-z][^ >]*)([^>]*)>(.*)`sm', $t[$i], $m)) {

For some reason it struggles with a sequence of many short lines (100 lines, 10 characters each), followed by a sequence of very long lines (10 lines, 10k long each). That sequence takes more than 1 minute to parse.

agulabon11 commented 3 weeks ago

Test code:

$t="";

for ($i=0; $i<100; $i++)
        $t .= "1234567890\n";

for ($i=0; $i<10; $i++) {
        $l = "";
        for ($j=0; $j<10000; $j++)
                $l .= "x";
        $t .= $l . "\n";
}

if (preg_match('`^(\/?)([a-z][^ >]*)([^>]*)>(.*)`sm', $t, $m)) {
  print("match");
}

It takes 15 seconds for me.