klahnakoski / mo-sql-parsing

Let's make a SQL parser so we can provide a familiar interface to non-sql datastores!
Mozilla Public License 2.0
261 stars 58 forks source link

Feature Request: Functions to extract all tables, columns used by a query #226

Open niroshaimos opened 10 months ago

niroshaimos commented 10 months ago

A function that could get a query\json repr of a query and output all tables/columns used by a query would be useful for cases in which you want to check if the user of an attempted query has sufficient permissions before attempting execution of the query. Thanks!

a1ea321 commented 2 months ago

I'd also find this useful, for another purpose. But I don't think the functionality should be in this library. This parser has enough on its plate as it is. It gets confused too often when I throw queries at it from the real world. After all, it supports multiple DBs.

Anyway, I need it so much I started writing it myself. And I've seen it is not possible to really do it without a DB connection. Consider these two queries:

These are equivalent, as long as there is no x in t2 and no y in t1. Therefore, the function @niroshaimos wanted must have the query as one input parameter, and a database connection as the other parameter. Alternatively, the output could claim the table to be ? in such cases and the programmer might have to do some post-processing.

Not a problem for what I am writing. All the input queries will be written by me, so I can be explicit (prepend the table names to the column names).

But, aliases that are the same and/or nested are what made me gave up. In the PostgreSQL manual there was a good example illustrating what I mean. But I couldn't find it. I hope you get the idea with my following example:

-- Preparation:
create view t1 as (select 42 a); 
create view t2 as (select 43 b);
-- This is the query to illustrate my point:
select t1.b, t2.b
from (select b from t2) t1
join t2 on 1=1;

I'd consider this bad style, but it is valid SQL. Having to handle this sort of mess is enough to keep me away from trying to write a function that works for every valid query.

I have a piece of code that lists tables but because of what I mentioned above I gave up trying to extract the column names as well. You may use my progress if you find it useful. Here. It just outputs tables, not columns.

I gave up but what I need to do still needs to get done. For PostgreSQL queries I am going to use pglast and use visitors to write something myself - if the library does not have already. For MS SQL... Not sure.