sad-spirit / pg-builder

Query builder for Postgres backed by SQL parser
BSD 2-Clause "Simplified" License
55 stars 6 forks source link

Extract relation names from parsed SQL string #10

Closed mapcentia closed 3 years ago

mapcentia commented 3 years ago

I'm would like to extract all relation names, which are included in a query (both select, update, insert and delete). The statements are arbitrary. I can get the FromList, but I'm not sure from there to get the relation names. In this case I'm after bar and bup:

use sad_spirit\pg_builder\Lexer;
use sad_spirit\pg_builder\Parser;

$strings = "SELECT *, (select foo from bar limit 1) as bib FROM bup";

$lexerStandard = new Lexer(array(
    'standard_conforming_strings' => true
));

$parser = new Parser($lexerStandard);
$test = $parser->parseSelectStatement($strings);

var_dump($test->from);
sad-spirit commented 3 years ago

This is actually covered in the docs: https://github.com/sad-spirit/pg-builder/wiki/walkers

You need to subclass BlankWalker and override walkRelationReference() method. When you do

$test->dispatch($walker);

that method will be called for every relation name appearing in the query.

sad-spirit commented 3 years ago

Oops, I missed that part:

both select, update, insert and delete

You'll need to override walkInsertTarget() and walkUpdateOrDeleteTarget() methods as well.

mapcentia commented 3 years ago

I couldn't get walkInsertTarget and walkUpdateOrDeleteTarget to work so I did this:

class TableWalker extends BlankWalker
{
    /**
     * @var array<string>
     */
    private $relations;

    public function walkRelationReference(RelationReference $rangeItem): void
    {
        $this->relations[] = (string)$rangeItem->name;
    }

    public function walkInsertStatement(Insert $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }

    public function walkUpdateStatement(Update $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }

    public function walkDeleteStatement(Delete $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }
    // DOESN'T WORK (method not called)
    public function walkUpdateOrDeleteTarget(UpdateOrDeleteTarget $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }

    /**
     * @return array<string>
     */
    public function getRelations(): array
    {
        return $this->relations;
    }
}

But when I do something like this:

$factory = new StatementFactory();
$select = $factory->createFromString("WITH t AS (DELETE FROM foo) DELETE FROM bar");
$walker = new TableWalker();
$select->dispatch($walker);
print_r($walker->getRelations());

I only get bar: Array([0] => bar)

Also some thing like this: delete from foo where id in (select id from bar) Only yields foo and not bar

Is where a way to write the class so I always get all relations used?

sad-spirit commented 3 years ago

Sorry, you are right, there were indeed missing dispatch() calls in BlankWalker: cbf265bf604d328be9447a043de843a8fa48b125

The fix is included in the most recent release, so please upgrade if you can. I'll see whether I can backport it to 0.4 branch and make a new release from that.

As for your code, when you do

    public function walkDeleteStatement(Delete $statement): void
    {
        $this->relations[] = (string)$statement->relation->relation;
    }

you are not dispatching to any child nodes of Delete thus CTEs and subselects are not processed.

Compare the above with actual code from master branch:

    public function walkDeleteStatement(Delete $statement)
    {
        $statement->with->dispatch($this);
        $statement->relation->dispatch($this);
        $statement->using->dispatch($this);
        $statement->where->dispatch($this);
        $statement->returning->dispatch($this);
        return null;
    }
mapcentia commented 3 years ago

I'm using 1.0.0-beta. As I understand, the behaviour I'm getting is excepted. But how do I walk subselects in delete and update statements?

sad-spirit commented 3 years ago

If you are using 1.0.0-beta, then extending BlankWalker and implementing (only!) walkRelationReference(), walkInsertTarget(), walkUpdateOrDeleteTarget() should work properly, you don't need to change other methods as this may break walking parts of the query.

The problem in previous versions was that dispatch() calls leading to walkInsertTarget(), walkUpdateOrDeleteTarget() were omitted, e.g.

        $statement->relation->dispatch($this);

in the above code. That's exactly the same problem you have with your code now: if you override walkDeleteStatement() then for subselects and CTEs to work you need to make sure you dispatch to Nodes representing these:

    public function walkDeleteStatement(Delete $statement): void
    {
        // this will dispatch to child nodes
        parent::walkDeleteStatement($statement);
        // your additional actions (not really needed as walkUpdateOrDeleteTarget() will be called in the above)
        $this->relations[] = (string)$statement->relation->relation;
    }
mapcentia commented 3 years ago

Thanks a lot! By implementing only walkRelationReference(), walkInsertTarget() and walkUpdateOrDeleteTarget() it now works