greenlion / PHP-SQL-Parser

A pure PHP SQL (non validating) parser w/ focus on MySQL dialect of SQL
BSD 3-Clause "New" or "Revised" License
609 stars 159 forks source link

Position for CLAUSES #177

Open witchi opened 9 years ago

witchi commented 9 years ago

From abi...@owasp.org on September 25, 2014 23:05:32

What steps will reproduce the problem? 1. Parse any SQL query

  1. Clauses that appear as KEYs in the array are not treated as NODES. They neither have positions nor anything else. What is the expected output? What do you see instead? I'm trying to mark all tokens in another string that simplifies the query, but major CLAUSES like SELECT and WHERE don't support position or other information. They should be treated like other nodes. What version of the product are you using? On what operating system? OS X Yosemite php-sql-parser-20140108 Please provide any additional information below.

Original issue: http://code.google.com/p/php-sql-parser/issues/detail?id=151

witchi commented 9 years ago

From pho...@gmx.de on September 26, 2014 08:31:50

You can try to use the existing positions and look forward/backward from the known position to find the keyword. I'll set this as enhancement, it is a larger change. I have to modify the array a lot (additional levels), which can have side-effects on existing client code.

Status: Accepted
Owner: pho...@gmx.de
Labels: -Type-Defect Type-Enhancement

witchi commented 9 years ago

From pho...@gmx.de on September 26, 2014 10:49:12

There is an SQLChunkProcessor class, where you can find all the keywords, which are not part of the calculator. So the first thing you can do, is to enhance this class, and add a new level just below each of the keywords like

if (!empty($out['SELECT'])) { $processor = new SelectProcessor(); $out['SELECT'] = array('expr_type'=>ExpressionType::RESERVED, 'base_expr'=>'', 'sub_tree'=>$processor->process($out['SELECT'])); }

You always need an expr_type, which is used in the PositionCalculator to differ between type with/without backtracking. The base_expr contains the original string of the statement. So maybe it is better to enhance the processors directly (you can have 'select' or 'SeLeCt' or 'SELECT'). In the sub_tree section, you should add the rest of the parsed statement part.

But there are side-effects. In example the top-level SELECT key is used in other Processors, so you have change these things to access the right new sub-level ([SELECT][0][sub_tree] instead of [SELECT]).

I think, there is a lot to do, maybe you should think about the other solution:

  1. hold the current position
  2. if you get a keyword index, make a strpos(strtoupper($index), $curPos) to find the next matching position
  3. in all cases use the [position] and store it always in the $curPos
witchi commented 9 years ago

From pho...@gmx.de on September 26, 2014 10:58:00

  1. must be strpos(strtoupper($statement), $index, $curPos)

The $index should always be in uppercases.

witchi commented 9 years ago

From pho...@gmx.de on September 26, 2014 11:08:34

It can be necessary to add the strlen([base_expr]) to the $curPos, just before you search the index name. The [position] gives you always the start of the [base_expr], so if you store the top-level position, you have to add the length of the statement part associated with it. This allows you to start the search right after [base_expr] and there should be only some withespace before the index keyword starts.

witchi commented 9 years ago

From abi...@owasp.org on September 26, 2014 11:48:33

You got me totally confused with these comments. All I need is to have base_expr and position entries in the high-level keywords too, just like the rest of the nodes. -A

witchi commented 9 years ago

From pho...@gmx.de on September 26, 2014 13:51:50

But to be conform with the rest of the parser output, i think you should insert a new level below SELECT.

This has side effects to the position calculator and some other parts of the parser. So i think that it could be simpler to calculate the positions of the keywords like WHERE or UPDATE with the already existing positions.

In example: you get the position of the whole FROM clause. The next clause is the WHERE clause. Now you have two possibilities. You can use the position of the FROM clause, add the length of the FROM base_expr and start from this position the search for "Where". Or you can get the first position under the WHERE key and looks backward for "Where".

witchi commented 9 years ago

From abi...@owasp.org on September 26, 2014 13:57:24

I dont need it to conform to the rest of the parser as long as it satisfies my need :D Can you let me know where I should change to achieve that? -A

witchi commented 9 years ago

From abiusx on October 07, 2014 11:55:01

I tried a lot of traversing, it doesn't seem to work. For example, in the case of a UNION, the first subtree is before it and the second after it! It made my code real nasty and I bet it would stop working at a lot of points.

So now I am planning to modify PositionCalculator.php to add 'position' to all entries, not just those that already have it, but I can't seem to figure out what its planning. I don't see where it skips SELECT and UNION and similar clauses, so that I can add it there.

Thanks for the help, -A