Closed gmantele closed 3 years ago
The grammar fix proposed here should allow the following ADQL queries:
Simple SELECT
:
SELECT hip FROM hipparcos WHERE hip <= 10
UNION
SELECT hip FROM hipparcos WHERE hip > 120406
ORDER BY 1 DESC
SELECT TOP ...
:
SELECT hip FROM hipparcos WHERE hip <= 10
UNION
SELECT TOP 10 hip FROM hipparcos WHERE hip > 10
ORDER BY 1 DESC
SELECT TOP ... ORDER BY ...
:
SELECT hip FROM hipparcos WHERE hip <= 10
UNION
(SELECT TOP 10 hip FROM hipparcos WHERE hip > 10 ORDER BY 1)
ORDER BY 1 DESC
INTERSECT
:
SELECT hip FROM hipparcos WHERE hip <= 10
INTERSECT
SELECT hip FROM hipparcos WHERE hip <= 10 AND mod(hip, 2) = 0
ORDER BY 1 DESC
EXCEPT
:
SELECT hip FROM hipparcos WHERE hip <= 10
EXCEPT
SELECT hip FROM hipparcos WHERE hip <= 10 AND mod(hip, 2) = 0
ORDER BY 1 DESC
Precedence UNION
vs INTERSECT
:
SELECT hip FROM hipparcos WHERE hip <= 10
UNION
SELECT hip FROM hipparcos WHERE hip > 120406
INTERSECT
SELECT hip FROM hipparcos WHERE (hip <= 10 OR hip > 120406) AND mod(hip, 2) = 0
ORDER BY 1 DESC
Precedence with parentheses:
(
SELECT hip FROM hipparcos WHERE hip <= 10
UNION
SELECT hip FROM hipparcos WHERE hip > 120406
)
INTERSECT
SELECT hip FROM hipparcos WHERE (hip <= 10 OR hip > 120406) AND mod(hip, 2) = 0
ORDER BY 1 DESC
With CTE:
WITH tt AS (SELECT hip FROM hipparcos WHERE hip > 120406)
SELECT hip FROM hipparcos WHERE hip <= 10
UNION
SELECT hip FROM tt
ORDER BY 1 DESC
Embedded CTE:
SELECT hip FROM hipparcos WHERE hip <= 10
UNION
(
WITH tt AS (SELECT hip FROM hipparcos WHERE hip > 120406)
SELECT hip FROM tt
)
ORDER BY 1 DESC
I tested their SQL translation with PostgreSQL, MySQL and MS-SQLServer. Although some syntactic changes were required depending on the DBMS, all of them pass.
=> Tell me if I forgot some important cases.
Some notes about the MySQL and MS-SQLServer:
INTERSECT
and EXCEPT
do not exist in MySQL, but it is possible to get the
same result with joins.
Embedded CTE with MS-SQLServer does not work. The only hack I found is to put all CTEs at the root level. Let me know if this hack will not be enough in some cases.
On Fri, Oct 23, 2020 at 10:34:23AM -0700, Grégory Mantelet wrote:
The grammar fix proposed here should allow the following ADQL queries:
I think I agree with all of the queries except this one:
- Embedded CTE:
SELECT hip FROM hipparcos WHERE hip <= 10 UNION ( WITH tt AS (SELECT hip FROM hipparcos WHERE hip > 120406) SELECT hip FROM tt ) ORDER BY 1 DESC
Is there an overriding reason to allow CTEs in operands of set operators? I cannot see how that would raise the expressiveness of the language (and I think your parallel comment about pulling CTEs to the top leven in SQLServer says about as much).
I see a slight advantage in that you can stick arbitrary queries into such arguments, which might be nice when composing queries out of pre-fabricated fragments, but then the cut-and-paste effort of taking the inner with_queries and sticking them to the top level doesn't seem to be prohibitive, and you get queries that are better understandable because there are fewer places you might have to check for table expressions.
The reason I'd rather not have these "embedded" CTEs is that table reolution (i.e., figuring out the table definition for a given table reference) is hard enough already. When CTEs are only allowed at the top level, it doesn't get much harder (I was actually positively surprised how little code was necessary for that). Allowing CTEs in extra positions will make that a good deal more complicated.
So: I'd say embedded CTEs are little gain for noticable pain, and hence I'd say we shouldn't have them (unless someone finds a stronger rationale than the cut-and-pastee case I'm making above).
In grammar terms, I think that would mean introducing a:
<select_expression> ::=
<query_set_expression>
[ <order_by_clause> ]
[ <offset_clause> ]
With that, we'd have:
<query_specification> ::=
WITH <with_query> {, <with_query>} ...
<select_expression>
<query_set_primary> ::=
<select_query>
| <left_paren> <select_expression> <right_parent>
<query_expression> ::=
<select_expression>
| <joined_table>
Incidentally, even if there's actually a case for CTEs in set operands, I'm pretty sure we wouldn't want the CTEs in a normal subquery (which is what query_expression is used for) -- or do we?
I agree, I do not see the point of having CTEs inside a set's operand. I added this query example because there is no reason why it should not be possible considering that CTEs are currently allowed in "normal" subqueries : this way, it is consistent. Besides, it works like that in most DBMS.
However, I admit I have not yet started to implement the set operators in my ADQL parser (but I will this week...and before merging this PullRequest for sure), so I am now very sensible to the point @msdemlei raises about the table resolution : table resolution will be indeed more complicated than it is now (and it is already a bit challenging).
As @msdemlei , I agree that CTEs should be allowed only at the top level query and not any more in subqueries or set's operands. This will help parser development but also it will make ADQL queries more readable (all CTEs will always be declared at only one place: right before the main query).
Me too, I can not find any example which will make "embedded" CTEs different than having them declared at the top level. Does anybody else have any example?
About the "embedded" CTEs, I do not have to change anything in my parser to make it work, even for the table resolution. So, either solution (allowing them, or not) is fine with me, on an implementation point of view.
Now, because it makes appararently no difference whether the CTEs are embedded or not, and because it would make ADQL queries more clear, I propose that in ADQL(-2.1) CTE are only accepted at the root level.
So, to sum up with examples:
SELECT ra, dec FROM myTable
UNION
(WITH cteTable AS (SELECT ra, dec FROM anotherTable)
SELECT * FROM cteTable)
won't be accepted in ADQL, but the following will be:
WITH cteTable AS (SELECT ra, dec FROM anotherTable)
SELECT ra, dec FROM myTable
UNION
SELECT * FROM cteTable
This way, if later we discover there are use-cases where embedded CTEs are the only solution, we can easily extend the ADQL grammar without breaking anything.
I change the grammar right now. If any opposition, speak now (or before this PullRequest is accepted/merged).
I moved the section about the WITH
clause (or CTEs), just before the set operators so that speaking of CTEs in the set operators section makes sense.
I also added the following sentence in the WITH section:
For clarity reason, common table expressions can be defined only in the main query. They are not allowed in sub-queries.
It's time for whoever is ready to review this Pull Request. According to me, this Pull Request and its corresponding issue can now be closed.
A proposal to fix the grammar of the set operators and their description.
Resolves #20