malloydata / malloy

Malloy is an experimental language for describing data relationships and transformations.
http://www.malloydata.dev
MIT License
1.96k stars 76 forks source link

We're always using COUNT(DISTINCT key) instead of COUNT(1) when we could #1598

Closed lloydtabb closed 8 months ago

lloydtabb commented 8 months ago
source: a is duckdb.table('data/state_facts.parquet') extend {
  measure: c is count()
}

run: a -> {aggregate: c}

Generates a distinct key and shouldn't

SELECT 
   COUNT(DISTINCT a."__distinct_key") as "c"
FROM (SELECT GEN_RANDOM_UUID() as __distinct_key, x.*  FROM 'data/state_facts.parquet' as x) as a
christopherswenson commented 8 months ago

I think this line

https://github.com/malloydata/malloy/blob/0374b4217802c0599496d71dc3a75c6b6ded57e4/packages/malloy/src/model/malloy_query.ts#L1841

should be join.parent !== undefined instead of join.parent !== null and that fixes it.