malloydata / malloy

Malloy is an experimental language for describing data relationships and transformations.
http://www.malloydata.dev
MIT License
1.95k stars 75 forks source link

Incorrect count of join_many joined rows (1s when should be 0s). #1762

Open bporterfield opened 1 month ago

bporterfield commented 1 month ago

What happens?

source: user is db.table('public.user') extend {
  primary_key: id
  measure: user_count is count()
}

source: chat is db.table('public.chat_message') extend {
    primary_key: id
    join_one: user on sender_id = user.id
    measure: chat_count is count()
}

query: user_chat_facts is user -> { 
    extend: {
        join_many: chat on chat.sender_id = id
    }
    group_by: id
    aggregate: chat.chat_count
}

This produces a chat_count of 1 for every user that does not have a chat (should be 0?).

The SQL constructed is:

WITH __stage0 AS (
  SELECT 
     base."id" as "id",
     COUNT(1) as "chat_count"
  FROM "public"."user" as base
   LEFT JOIN "public"."chat_message" AS chat_0
    ON chat_0."sender_id"=base."id"
  GROUP BY 1
  ORDER BY 2 desc NULLS LAST
)
SELECT row_to_json(finalStage) as row FROM __stage0 AS finalStage

Lloyd says "That should return 0. Sounds like a bug. It should be counting the primary care distant key of the join thing. I think I try to build an optimization that doesn't work in this case"

To Reproduce

See above

OS:

any

Malloy Client:

local VSCode

Malloy Client Version:

lastest

Database Connection:

Postgres

bporterfield commented 1 month ago

Might be dup of #1584