Open niroshaimos opened 10 months ago
I'd also find this useful, for another purpose. But I don't think the functionality should be in this library. This parser has enough on its plate as it is. It gets confused too often when I throw queries at it from the real world. After all, it supports multiple DBs.
Anyway, I need it so much I started writing it myself. And I've seen it is not possible to really do it without a DB connection. Consider these two queries:
select t1.x, t2.y from t1 join t2 on ...
select x, y from t1 join t2 on ...
These are equivalent, as long as there is no x
in t2
and no y
in t1
. Therefore, the function @niroshaimos wanted must have the query as one input parameter, and a database connection as the other parameter. Alternatively, the output could claim the table to be ?
in such cases and the programmer might have to do some post-processing.
Not a problem for what I am writing. All the input queries will be written by me, so I can be explicit (prepend the table names to the column names).
But, aliases that are the same and/or nested are what made me gave up. In the PostgreSQL manual there was a good example illustrating what I mean. But I couldn't find it. I hope you get the idea with my following example:
-- Preparation:
create view t1 as (select 42 a);
create view t2 as (select 43 b);
-- This is the query to illustrate my point:
select t1.b, t2.b
from (select b from t2) t1
join t2 on 1=1;
I'd consider this bad style, but it is valid SQL. Having to handle this sort of mess is enough to keep me away from trying to write a function that works for every valid query.
I have a piece of code that lists tables but because of what I mentioned above I gave up trying to extract the column names as well. You may use my progress if you find it useful. Here. It just outputs tables, not columns.
I gave up but what I need to do still needs to get done. For PostgreSQL queries I am going to use pglast and use visitors to write something myself - if the library does not have already. For MS SQL... Not sure.
A function that could get a query\json repr of a query and output all tables/columns used by a query would be useful for cases in which you want to check if the user of an attempted query has sufficient permissions before attempting execution of the query. Thanks!