malloydata / malloy

Malloy is an experimental language for describing data relationships and transformations.
http://www.malloydata.dev
MIT License
1.98k stars 76 forks source link

`x != y` produces wrong value (`true`) when `x` and `y` are both `null`-valued #1968

Open christopherswenson opened 4 days ago

christopherswenson commented 4 days ago
run: duckdb.sql("""
  SELECT 'null' as name, null as value
  UNION ALL 
  SELECT 'not null' as name,  1 as value
""") -> {
  group_by: value
  group_by: `null != null` is null != null
  group_by: `value != null` is value != null
  group_by: `value != value` is value != value
  group_by: `null != value` is null != value
}

Produces

SELECT 
   base."value" as "value",
   false as "null != null",
   base."value" IS NOT NULL as "value != null",
   NOT(COALESCE(base."value"=base."value", FALSE)) as "value != value",
   base."value"IS NOT NULL as "null != value"
FROM (
  SELECT 'null' as name, null as value
  UNION ALL 
  SELECT 'not null' as name,  1 as value
) as base
GROUP BY 1,2,3,4,5
ORDER BY 1 asc NULLS LAST
value null != null value != null value != value null != value
1 false true false true
false false true false
christopherswenson commented 4 days ago

I think if we generated NOT(COALESCE(base."x"=base."y" OR (base."x" IS NULL AND base."y" IS NULL), FALSE)) we would get the right answer...

christopherswenson commented 4 days ago
source: values is duckdb.sql("""
  SELECT null as value
  UNION ALL 
  SELECT 1 as value
  UNION ALL 
  SELECT 2 as value
""") extend {
  dimension: x is value
  join_cross: others is duckdb.sql("""
    SELECT null as value
    UNION ALL 
    SELECT 1 as value
    UNION ALL 
    SELECT 2 as value
  """) on true
  dimension: y is others.value

  # dashboard 
  view: summary is {
    nest: no_values is {
      group_by: `null = null` is null = null
      group_by: `null != null` is null != null
      group_by: `1 = null` is 1 = null
      group_by: `1 != null` is 1 != null
      group_by: `null = 1` is null = 1
      group_by: `null != 1` is null != 1
      group_by: `1 = 1` is 1 = 1
      group_by: `1 != 1` is 1 != 1
      group_by: `1 = 2` is 1 = 2
      group_by: `1 != 2` is 1 != 2
      group_by: `not(true)` is not(true)
      group_by: `not(false)` is not(false)
      group_by: `not(null)` is not(null)
    }
    nest: one_value is {
      group_by: x
      group_by: `x = null` is x = null
      group_by: `x != null` is x != null
      group_by: `null = x` is null = x
      group_by: `null != x` is null != x
      group_by: `x = x` is x = x
      group_by: `x != x` is x != x
      group_by: `not(x = x)` is not(x = x)
      group_by: `x = 2` is x = 2
      group_by: `x != 2` is x != 2
      group_by: `not(x = 2)` is not(x = 2)
    }
    nest: two_values is {
      group_by: x, y
      group_by: `x = y` is x = y
      group_by: `x != y` is x != y
      group_by: `not (x = y)` is not (x = y)
    }
  }
}

Big screenshot of current behavior:

Screenshot 2024-10-16 at 1 44 44 PM
lloydtabb commented 2 days ago

NOT(COALESCE(base."x"=base."y")

Yes, I think this is how we have to write it. We did, at one point.