rickywu-posh / php-sql-parser

Automatically exported from code.google.com/p/php-sql-parser
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Very long query cannot be parsed #11

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Test code :

$test    = str_repeat('0', 18000);
$query  = "UPDATE club SET logo='$test' WHERE id=1";

$parser = new PHPSQLParser();
$result = $parser->parse($query);

Result : query cannot be parsed and cause the current PHP process to 
crash/timeout.

Original issue reported on code.google.com by johnny.c...@gmail.com on 20 May 2011 at 5:33

GoogleCodeExporter commented 8 years ago
I have tracked it down to 
http://us3.php.net/manual/en/regexp.reference.subpatterns.php: "The maximum 
number of captured substrings is 99, and the maximum number of all subpatterns, 
both capturing and non-capturing, is 200."  So the following string is the 
smallest that will trigger a crash/timeout:

$test    = str_repeat('0',202);
$sql  = "'$test'";

And it's caused specifically by the tokenizer, as the following stand-alone 
code demonstrates (it will hang if you run it):

$test    = str_repeat('0',201);
$sql  = "'$test'";
$sql = str_replace(array('\\\'','\\"',"\r\n","\n","()"),array("''",'""'," "," 
"," "), $sql);
$regex=<<<EOREGEX
/(`(?:[^`]|``)`|[@A-Za-z0-9_.`-]+(?:\(\s*\)){0,1})
|(\+|-|\*|\/|!=|>=|<=|<>|>|<|&&|\|\||=|\^)
|(\(.*?\))   # Match FUNCTION(...) OR BAREWORDS
|('(?:[^']|'')*'+)
|("(?:[^"]|"")*"+)
|([^ ,]+)
/ix
EOREGEX
;
$tokens = preg_split($regex, $sql,-1, PREG_SPLIT_NO_EMPTY | 
PREG_SPLIT_DELIM_CAPTURE);

Original comment by kbacht...@gmail.com on 20 Oct 2011 at 3:35

GoogleCodeExporter commented 8 years ago
Whoops that should read:

$test    = str_repeat('0',202);

In the second stand-alone example.

Original comment by kbacht...@gmail.com on 20 Oct 2011 at 3:35

GoogleCodeExporter commented 8 years ago
Here's a simple fix.  In the $regex in the parser (~line 168) in split_sql, 
change the ' and " parts to have a + after the character classes:

|('(?:[^']+|'')*'+)
|("(?:[^"]+|"")*"+)
          ^ note the added '+' on both lines

This will cause everything inside the string and between '' and "" delimiters 
to be immediately combined into a single match instead of each separate matches 
thereby quickly reaching the 200 limit.  Of course this will still break if you 
have more than 200 context switches between data, '', data, '', etc. but 
hopefully that should happen very, very, very rarely :-)

Hope this helps!

Original comment by kbacht...@gmail.com on 20 Oct 2011 at 3:41

GoogleCodeExporter commented 8 years ago
I can confirm that this exists and comment #3 
(http://code.google.com/p/php-sql-parser/issues/detail?id=11#c3) does indeed 
fix the issue.

Original comment by ben.swin...@gmail.com on 13 Jan 2012 at 2:28

GoogleCodeExporter commented 8 years ago
solution (comment #3) added to current version on 
http://www.phosco.info/php-sql-parser_current.zip

Original comment by pho...@gmx.de on 2 Feb 2012 at 8:25

GoogleCodeExporter commented 8 years ago
@pho...@gmx.de

Not added to current version- Current version has 

|('(?:[^']|'')*'+)
|("(?:[^"]|"a")*"+)

Whereas solution has 

|('(?:[^']+|'')*'+)
|("(?:[^"]+|"")*"+)

Original comment by ben.swin...@gmail.com on 5 Mar 2012 at 10:29

GoogleCodeExporter commented 8 years ago
I have added a test with the code provided by jonny and it works. I have 
changed the regular expression, so perhaps it works without your changes. Can 
you provide a test code, which doesn't work?

Try the repository on https://www.phosco.info/publicsvn/php-sql-parser
I have no commit rights on the original codebase, so I have to provide the 
changes on my own SVN.

Original comment by pho...@gmx.de on 5 Mar 2012 at 12:08

GoogleCodeExporter commented 8 years ago
Accepted fixed codebase.

Original comment by greenlion@gmail.com on 12 Mar 2012 at 9:54