Closed derekmahar closed 4 years ago
Isn’t possible to replace to with, for example, WITH clause and JOIN clause?
I think for really complicated stuff, just import the CSVs into the SQL database of your choice and have your way with them.
I see CSVQ as a tool to do quick and dirty queries on smallish data sets. And that it does excellently.
el
— Sent from Dr Lisse’s iPad Mini 5 On 28. Dec 2019, 09:12 +0200, Yuki notifications@github.com, wrote:
Isn’t possible to replace to with, for example, WITH clause and JOIN clause? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Yes, I made csvq for simple confirmations and calculations of data. Csv has no types, no restrictions, no relations, no indexes, so it is not suitable for large or complicated calculations. And for those reasons, I am not aiming for a complete SQL implementation.
I’ll fix as much as possible obvious bugs or features that are useful in most cases, but in most cases I think that it will work with existing features. I’m not sure if LATERAL is so needed and there are no other way.
Isn’t possible to replace to with, for example, WITH clause and JOIN clause?
Possibly. I'd have to do more research. The following solutions are likely inefficient:
More than CTE and Cross Join suggests that LATERAL
may combine and simplify the functionality of common table expressions, cross joins, and the WINDOW
keyword.
This answer to How to get previous and current row value using recursive CTE? compares solutions using Common Table Expressions, window functions, and OUTER APPLY
which is syntax in Microsoft SQL Server that is equivalent or similar to LATERAL
.
Yes, I made csvq for simple confirmations and calculations of data. Csv has no types, no restrictions, no relations, no indexes, so it is not suitable for large or complicated calculations. And for those reasons, I am not aiming for a complete SQL implementation.
I’ll fix as much as possible obvious bugs or features that are useful in most cases, but in most cases I think that it will work with existing features.
I highly appreciate that csvq
already supports much of the SQL standard including advanced syntax like Common Table Expressions and window functions. I raised this issue for the following reasons:
LATERAL
may simplify a query that I wish to execute using csvq
.LATERAL
is absent from csvq
and I think it's useful to know in what ways csvq
deviates from the SQL standard.LATERAL
is part of the SQL standard and I assumed that a goal of csvq
was to conform to the SQL standard as closely as possible and where it makes sense in the context of CSV files.I don't expect csvq
to support all of the SQL standard, but the closer the better. Over time, you may decide to fill in any gaps that you think you or your users may find useful.
I’m not sure if LATERAL is so needed and there are no other way.
I'm not yet sure, either, but LATERAL
does seem to simplify certain queries that use derived tables. I doubt that the SQL standard would have introduced LATERAL
if it weren't useful.
I think for really complicated stuff, just import the CSVs into the SQL database of your choice and have your way with them.
I may do this eventually, but for now I'm exploring what computations I may perform using csvq
alone. In effect, I'm testing the capabilities and limitations of csvq
, and will report any limitations or flaws that I encounter.
I see CSVQ as a tool to do quick and dirty queries on smallish data sets. And that it does excellently.
I agree.
Okey, maybe I will implement it, but now the priority is low because it can be done in other ways.
LATERAL JOIN is now included and released in version 1.13.0.
I tested a LATERAL JOIN
using a slightly modified version of the example in PostgreSQL’s LATERAL JOIN:
$ cat orders.csv | csvq "SELECT * FROM STDIN"
+----+---------+----------------------------+
| id | user_id | created_at |
+----+---------+----------------------------+
| 1 | 1 | 2017-06-20 04:35:03.582895 |
| 2 | 2 | 2017-06-20 04:35:07.564973 |
| 3 | 3 | 2017-06-20 04:35:10.986712 |
| 4 | 1 | 2017-06-20 04:58:10.137503 |
| 5 | 3 | 2017-06-20 04:58:17.905277 |
| 6 | 3 | 2017-06-20 04:58:25.289122 |
+----+---------+----------------------------+
-- lateral_query1.sql
SELECT user_id, first_order_time, next_order_time, id FROM
(SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
LEFT JOIN LATERAL
(SELECT id, created_at AS next_order_time
FROM orders o2
WHERE o2.user_id = o1.user_id AND created_at > o1.first_order_time
ORDER BY created_at ASC LIMIT 1)
o2 ON true;
-- lateral_query2.sql
SELECT user_id, first_order_time, next_order_time, id FROM
(SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
INNER JOIN LATERAL
(SELECT id, created_at AS next_order_time
FROM orders o2
WHERE o2.user_id = o1.user_id AND created_at > o1.first_order_time
ORDER BY created_at ASC LIMIT 1)
o2 ON true;
$ cat orders.csv | csvq --source lateral_query1.sql
+---------+----------------------------+-----------------------------+------+
| user_id | first_order_time | next_order_time | id |
+---------+----------------------------+-----------------------------+------+
| 1 | 2017-06-20 04:35:03.582895 | 2017-06-20 04:58:10.137503 | 4 |
| 2 | 2017-06-20 04:35:07.564973 | NULL | NULL |
| 3 | 2017-06-20 04:35:10.986712 | 2017-06-20 04:58:17.905277 | 5 |
+---------+----------------------------+-----------------------------+------+
$ cat orders.csv | csvq --source lateral_query2.sql
+---------+----------------------------+-----------------------------+----+
| user_id | first_order_time | next_order_time | id |
+---------+----------------------------+-----------------------------+----+
| 1 | 2017-06-20 04:35:03.582895 | 2017-06-20 04:58:10.137503 | 4 |
| 3 | 2017-06-20 04:35:10.986712 | 2017-06-20 04:58:17.905277 | 5 |
+---------+----------------------------+-----------------------------+----+
The resuls of each these queries matches those in PostgreSQL’s LATERAL JOIN.
Please implement LATERAL subqueries. According to Add LATERAL Joins or CROSS APPLY to Your SQL Tool Chain, SQL:1999 introduced "lateral derived tables". Many popular relational database systems support lateral subqueries:
APPLY
implements lateral subqueries.In PostgreSQL’s LATERAL JOIN, Krzysztof Kempiński explains how LATERAL subqueries work in PostgreSQL 9.3.