mechatroner / RBQL

🦜RBQL - Rainbow Query Language: SQL-like query engine for (not only) CSV file processing. Supports SQL queries with Python and JavaScript expressions.
https://rbql.org
MIT License
275 stars 13 forks source link

EXCEPT doesn't work at all with JOIN #38

Open WhiteG00se opened 2 years ago

WhiteG00se commented 2 years ago

Hello there,

it's me again. I found another bug. Before describing it, I just quickly wanna thank you for RBQL and Rainbow CSV. About 2 months ago I just searched for a VS Code extension to improve readably of .csv files. Would have never imagined anything with such amazing functionality; really improved my interactions with .csv files by multiple orders of magnitude.

On to the bug: I've create 2 test .csv files file1.csv

id1|data1
1|qwe
2|asd
3|zxc
4|qwe
5|asd

file2.csv

id2|data2
1|rty
2|fgh
3|vbn
4|rty
5|fgh

These two queries result in errors, so at least no incorrect results. SELECT * EXCEPT b.id2 JOIN file2.csv ON a.id1==b.id2

error: 'Unknown field in EXCEPT expression: "b.id2"'

SELECT * EXCEPT b.data2 JOIN file2.csv ON a.id1==b.id2

error: 'Unknown field in EXCEPT expression: "b.data2"'

What is more problematic tho is that the following 2 queries return incorrect results. Ofc isolated like this, it's pretty obvious, that the results are off; but in use it could potentially go by unnoticed in a bigger project.

SELECT * EXCEPT a.id1 JOIN file2.csv ON a.id1==b.id2 result (2 columns are missing):

data1 qwe asd zxc qwe asd

SELECT * EXCEPT a.data1 JOIN file2.csv ON a.id1==b.id2 result (2 columns are missing):

id1 1 2 3 4 5

Best regards Tobias

mechatroner commented 2 years ago

Hello again, and thank you so much for the feedback, and for the bug report!

EXCEPT is a niche feature, the only major SQL engine which supports it (that I am aware of) is Big Query: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except. At some point I even removed EXCEPT, but then changed my mind and decided to keep it, so looks like I made a good decision if you (and maybe some other people too) find it useful. Saying that I think the best course of action, for now, is to explicitly disallow EXCEPT with JOIN statements until I find time to properly implement it.

WhiteG00se commented 2 years ago

I totally agree, if you disallow them being used together, the simple workaround is to run 2 consecutive queries, the 1st with SELECT * [...] JOIN [...] and the 2nd one with SELECT * EXCEPT [...] That's just mildly inconvenient, but no functionality is lost that way.

The important part is, as you also pointed out, to not produce incorrect results.

mechatroner commented 2 years ago

I fixed this in the master branch and will try to deploy the fix soon.