Non-aggregate "group by" doesn't work as expected

quarnster commented 1 year ago

testdata:

hello,world
hello,class
hello,man
goodbye,class

select * group by a1 and select * group by a2 both return all entries

mechatroner commented 1 year ago

Thank you very much for reporting. Looking into this.

mechatroner commented 1 year ago

I fixed this (surprisingly it was just a single line change) and added RBQL unit tests so it won't break again. To me, non-aggregating GROUP BY seems to be quite an exotic mode (probably this is why I completely forgot to implement and test it). And actually different DB vendors have different implementations of it, the most common one, it seems, is to just disallow "bare" columns in SELECT altogether (https://stackoverflow.com/questions/20074562/group-by-without-aggregate-function), while SQLite for example allows them and just chooses an arbitrary value from each bare column/group (https://stackoverflow.com/questions/67536804/are-there-in-guarantees-about-the-non-aggregated-columns-in-a-group-by-query). RBQL will have a middle ground approach - it allows bare values in SELECT if they are identical for each grouping key, but if there are 2 or more different values in any group the query would fail with an error.

quarnster commented 1 year ago

FWIW MySQL also does what SQLite does, but TIL that it's not standard. I'm 100% ok with the suggested fix.

quarnster commented 1 year ago

Just had an idea that maybe a new or existing aggregate function could accomplish the mysql/sqlite functionality. Like select ANY(a1),a2 group by a2 in case anyone ends up in this issue and needs it.

Edit: Already supported via for example select ARRAY_AGG(a2, (v) => v[0]), a1 group by a1. Cool!

mechatroner commented 1 year ago

ANY() sounds like a good idea, I will probably do it at some point.

mechatroner commented 1 year ago

Update: ANY_VALUE is now available in version 3.8.0.

mechatroner / vscode_rainbow_csv

Non-aggregate "group by" doesn't work as expected #144