gvwilson / sql-tutorial

The Querynomicon: An Introduction to SQL for Wary Data Scientists
https://gvwilson.github.io/sql-tutorial/
Other
419 stars 35 forks source link

Example 22 arbitrary choice in aggregation is very specific to sqlite #32

Closed cpcloud closed 5 months ago

cpcloud commented 5 months ago

The behavior of the query

select
    sex,
    body_mass_g                   
from penguins
group by sex;

is very specific to SQLite but the description of what's happening refers to generic SQL which I read as saying that this is a generic SQL behavior:

If we don’t specify how to aggregate a column, SQL can choose any arbitrary value from the group

SQLite is the exception here, not the norm.

Most SQL databases will fail on this query with an error message about the lack of aggregates present in the query.

Here are some examples:

DuckDB:

D create or replace table t as select 'a' k, 1 v union select 'a', 2 v;
D select k, v from t group by k;
Error: Binder Error: column "v" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(v)" if the exact value of "v" is not important.
LINE 1: select k, v from t group by k;
                  ^

Postgres:

ibis_testing=# create table t as select 'a' k, 1 v union select 'a', 2 v;
SELECT 2
ibis_testing=# table t;
 k | v
---+---
 a | 1
 a | 2
(2 rows)

ibis_testing=# select k, v from t group by k;
ERROR:  column "t.v" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select k, v from t group by k;
                  ^

Trino:

trino:default> create table t as select 'a' k, 1 v union select 'a', 2 v;
CREATE TABLE: 2 rows

Query 20240207_141759_00053_gprvk, FINISHED, 1 node
Splits: 164 total, 164 done (100.00%)
0.48 [0 rows, 0B] [0 rows/s, 0B/s]

trino:default> select k, v from t group by k;
Query 20240207_141807_00060_gprvk failed: line 1:11: 'v' must be an aggregate expression or appear in GROUP BY clause
select k, v from t group by k

ClickHouse:

falcon :) create view t as select 'a' k, 1 v union all select 'a', 2 v;

CREATE VIEW t AS
SELECT
    'a' AS k,
    1 AS v
UNION ALL
SELECT
    'a',
    2 AS v

Query id: b844463b-b6c8-4af9-8b8c-e236c33b98d9

Ok.

0 rows in set. Elapsed: 0.001 sec.

falcon :) select k, v from t group by k;

SELECT
    k,
    v
FROM t
GROUP BY k

Query id: c2b0fece-6305-426a-a45e-b33e809c9396

Elapsed: 0.001 sec.

Received exception:
Code: 215. DB::Exception: Column `v` is not under aggregate function and not in GROUP BY. Have columns: ['k']: While processing k, v. (NOT_AN_AGGREGATE)
gvwilson commented 5 months ago

thank you, I'll add a note and show what people should do instead.