malloydata / malloy

Malloy is an experimental language for describing data relationships and transformations.
http://www.malloydata.dev
MIT License
1.91k stars 75 forks source link

Explore the creation of a Substrait connector #946

Open jacques-n opened 1 year ago

jacques-n commented 1 year ago

It would be nice to add support for Substrait as an alternative output. If one wanted to explore doing this, would the project be open to this and what would be the best place to plug this in? (Substrait is algebraic so the dialect level doesn't feel like the optimal abstraction to implement this.)

lloydtabb commented 1 year ago

To do this well, I'm pretty sure you would want to expand the relational algebra.

Malloy has an internal algebra in that sources are basically relational symbol table graphs (StructDef) and queries are transformation operations of that graph.

The output of queries are symbol table graphs (StructDef). Nested tables and joins are nested graphs.

https://github.com/malloydata/malloy/blob/main/packages/malloy/src/model/malloy_types.ts

Theoretically, you could replace all the generateSQL functions to generateRelationalAlgebra functions but I'd doubt your current algebra could represent all the stuff we'd do (at least represent it nicely). I'd start by looking at this file and some of the SQL we generate for the nested queries, level of detail functions and symmetric aggreation.

https://github.com/malloydata/malloy/blob/main/packages/malloy/src/model/malloy_query.ts

We also probably don't keep enough type information around for you, but that is an easier fix.