mithrandie / csvq

SQL-like query language for csv
https://mithrandie.github.io/csvq
MIT License
1.5k stars 65 forks source link

Support LATERAL subqueries. #24

Closed derekmahar closed 4 years ago

derekmahar commented 4 years ago

Please implement LATERAL subqueries. According to Add LATERAL Joins or CROSS APPLY to Your SQL Tool Chain, SQL:1999 introduced "lateral derived tables". Many popular relational database systems support lateral subqueries:

In PostgreSQL’s LATERAL JOIN, Krzysztof Kempiński explains how LATERAL subqueries work in PostgreSQL 9.3.

mithrandie commented 4 years ago

Isn’t possible to replace to with, for example, WITH clause and JOIN clause?

ondohotola commented 4 years ago

I think for really complicated stuff, just import the CSVs into the SQL database of your choice and have your way with them.

I see CSVQ as a tool to do quick and dirty queries on smallish data sets. And that it does excellently.

el

— Sent from Dr Lisse’s iPad Mini 5 On 28. Dec 2019, 09:12 +0200, Yuki notifications@github.com, wrote:

Isn’t possible to replace to with, for example, WITH clause and JOIN clause? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mithrandie commented 4 years ago

Yes, I made csvq for simple confirmations and calculations of data. Csv has no types, no restrictions, no relations, no indexes, so it is not suitable for large or complicated calculations. And for those reasons, I am not aiming for a complete SQL implementation.

I’ll fix as much as possible obvious bugs or features that are useful in most cases, but in most cases I think that it will work with existing features. I’m not sure if LATERAL is so needed and there are no other way.

derekmahar commented 4 years ago

Isn’t possible to replace to with, for example, WITH clause and JOIN clause?

Possibly. I'd have to do more research. The following solutions are likely inefficient:

More than CTE and Cross Join suggests that LATERAL may combine and simplify the functionality of common table expressions, cross joins, and the WINDOW keyword.

This answer to How to get previous and current row value using recursive CTE? compares solutions using Common Table Expressions, window functions, and OUTER APPLY which is syntax in Microsoft SQL Server that is equivalent or similar to LATERAL.

derekmahar commented 4 years ago

Yes, I made csvq for simple confirmations and calculations of data. Csv has no types, no restrictions, no relations, no indexes, so it is not suitable for large or complicated calculations. And for those reasons, I am not aiming for a complete SQL implementation.

I’ll fix as much as possible obvious bugs or features that are useful in most cases, but in most cases I think that it will work with existing features.

I highly appreciate that csvq already supports much of the SQL standard including advanced syntax like Common Table Expressions and window functions. I raised this issue for the following reasons:

I don't expect csvq to support all of the SQL standard, but the closer the better. Over time, you may decide to fill in any gaps that you think you or your users may find useful.

I’m not sure if LATERAL is so needed and there are no other way.

I'm not yet sure, either, but LATERAL does seem to simplify certain queries that use derived tables. I doubt that the SQL standard would have introduced LATERAL if it weren't useful.

derekmahar commented 4 years ago

I think for really complicated stuff, just import the CSVs into the SQL database of your choice and have your way with them.

I may do this eventually, but for now I'm exploring what computations I may perform using csvq alone. In effect, I'm testing the capabilities and limitations of csvq, and will report any limitations or flaws that I encounter.

I see CSVQ as a tool to do quick and dirty queries on smallish data sets. And that it does excellently.

I agree.

mithrandie commented 4 years ago

Okey, maybe I will implement it, but now the priority is low because it can be done in other ways.

mithrandie commented 4 years ago

LATERAL JOIN is now included and released in version 1.13.0.

derekmahar commented 4 years ago

I tested a LATERAL JOIN using a slightly modified version of the example in PostgreSQL’s LATERAL JOIN:

$ cat orders.csv | csvq "SELECT * FROM STDIN"
+----+---------+----------------------------+
| id | user_id |         created_at         |
+----+---------+----------------------------+
| 1  | 1       | 2017-06-20 04:35:03.582895 |
| 2  | 2       | 2017-06-20 04:35:07.564973 |
| 3  | 3       | 2017-06-20 04:35:10.986712 |
| 4  | 1       | 2017-06-20 04:58:10.137503 |
| 5  | 3       | 2017-06-20 04:58:17.905277 |
| 6  | 3       | 2017-06-20 04:58:25.289122 |
+----+---------+----------------------------+
-- lateral_query1.sql
SELECT user_id, first_order_time, next_order_time, id FROM
  (SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
  LEFT JOIN LATERAL
  (SELECT id, created_at AS next_order_time
   FROM orders o2
   WHERE o2.user_id = o1.user_id AND created_at > o1.first_order_time
   ORDER BY created_at ASC LIMIT 1)
   o2 ON true;
-- lateral_query2.sql
SELECT user_id, first_order_time, next_order_time, id FROM
  (SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
  INNER JOIN LATERAL
  (SELECT id, created_at AS next_order_time
   FROM orders o2
   WHERE o2.user_id = o1.user_id AND created_at > o1.first_order_time
   ORDER BY created_at ASC LIMIT 1)
   o2 ON true;
$ cat orders.csv | csvq --source lateral_query1.sql
+---------+----------------------------+-----------------------------+------+
| user_id |      first_order_time      |       next_order_time       |  id  |
+---------+----------------------------+-----------------------------+------+
| 1       | 2017-06-20 04:35:03.582895 | 2017-06-20 04:58:10.137503  | 4    |
| 2       | 2017-06-20 04:35:07.564973 |            NULL             | NULL |
| 3       | 2017-06-20 04:35:10.986712 | 2017-06-20 04:58:17.905277  | 5    |
+---------+----------------------------+-----------------------------+------+
$ cat orders.csv | csvq --source lateral_query2.sql
+---------+----------------------------+-----------------------------+----+
| user_id |      first_order_time      |       next_order_time       | id |
+---------+----------------------------+-----------------------------+----+
| 1       | 2017-06-20 04:35:03.582895 | 2017-06-20 04:58:10.137503  | 4  |
| 3       | 2017-06-20 04:35:10.986712 | 2017-06-20 04:58:17.905277  | 5  |
+---------+----------------------------+-----------------------------+----+

The resuls of each these queries matches those in PostgreSQL’s LATERAL JOIN.