malloydata / malloy

Malloy is an experimental language for describing data relationships and transformations.
http://www.malloydata.dev
MIT License
1.99k stars 76 forks source link

Don't allow a nested select in a `group_by:` query. #1478

Open lloydtabb opened 1 year ago

lloydtabb commented 1 year ago

I'm not sure how to reliably generate code in this case. For now lets dis-allow it.

foo -> {
   group_by: something 
   nest: bar is {
      select: something_else   // error: nested `select:` can not be used in a outer group by.  Use `group_by: ` instead
   }
}
lloydtabb commented 1 year ago

@mtoy-googly-moogly for discussion.

mtoy-googly-moogly commented 1 year ago

so is the fix, all nest queries must be reduce ?

lloydtabb commented 1 year ago

Yes, for now. I think it is too confusing to know when it might and might not work. We'll end up creating duplicate records in many cases. I think forcing it to be a reduce (group_by:) query for now is the right thing and at some point when we have lots of brain cycles available see if we can figure out when to allow it.

carlineng commented 10 months ago

This came up again: https://malloy-community.slack.com/archives/C025JAK8G0N/p1703085766365359

In this case, the user is trying to replicate the following SQL query in Malloy that computes monthly revenue, with a comparison to the same month in the prior year:

SELECT
  year
  , month
  , rev
  , lag(rev) over(partition by year)
from data

The following Malloy compiles, but does not execute on DuckDB

run: duckdb.table('data.parquet') -> {
  group_by: mo
  nest: n is {
    select: yr, rev
    calculate: l is lag(rev)
    order_by: yr asc
  }
} -> {
  select: n.yr, mo, n.rev, n.l
}

Changing the inner select to group_by does work, but switching to a group_by may not always be what the user wants.

I think we need a way to replicate PARTITION BY functionality in SQL window functions.