hydromatic / morel

Standard ML interpreter, with relational extensions, implemented in Java
Apache License 2.0
291 stars 15 forks source link

Feedback from Strange Loop talk #74

Open julianhyde opened 2 years ago

julianhyde commented 2 years ago

I received the following feedback after my talk at Strange Loop. The feedback is quoted, my answers inline.

  1. Code that has nothing to do with querying can often be expressed as a query. For instance I've written groupby on python dictionaries many times. Are there any opportunities to further optimize code like this if it's declaratively expressed in morel? The from statement in morel communicates something more than an ordinary for loop. It tells the compiler that maybe order is not important, and this piece of information is hard to get with static analysis passes.

Yes, for years I have been doing things like using a 'for' loop to iterate over an array, then binding a parameter and executing a query. Or doing joins, intersections, minus, on collections.

Once these operations are expressed as relational algebra, we can ask ourselves what is the best implementation if the relations are actually collections such as lists, maps (dictionaries), sets.

I have skated over ordering in Morel's language design so far. Lists are ordered and relations are not. If the semantics of a from expression imply a particular ordering (and I haven't yet specified whether they do or don't) then it will not be valid to evaluate that expression using an unordered SQL query. Morel needs some means of specifying whether order is important.

It's very likely that the from expression will work over any monad (list, set, relation). If there is a single input, or if all the inputs are the same kind of monad, then the output would be the same kind.

  1. I also wonder what other optimizations (and ergonomic improvements) are possible in a language that sees both the query logic and the code around it. Some examples:
    • push some predicates into the query from the code that was meant to run client-side
    • pull computation from the query into client code to reduce load on the database (consider a query that asks for a cartesian product, which should instead be written as 2 separate queries and expanded client-side)
    • allow clean, modular expression of subqueries while actually merging them as an optimization if necessary

Ideally Morel would recognize these patterns and convert to relational algebra. But I also hope that programmers become accustomed to writing in the more concise “relational” manner, even if they intend to run the program locally. Do you have any patterns in mind? It would be useful to start a list.

  1. Is it feasible to register multiple data sources, join across them in morel, and let the compiler/planner figure out what kind of client-side join to perform?

Yes, Calcite is good at hybrid planning. It will consider local joins if the data sources are distinct.

DavidPratten commented 2 years ago

Hi Julian,

Thanks for the Strange Loop talk.

I've been looking for languages which embrace the relational model and build on SQL. So far in my journey I've discovered:

Even so, I haven't yet found anything better than "WITH RECURSIVE" CTEs and "CREATE VIEW AS" to query data with:

What is your intuition / approach on how Morel will/should handle tree-based and network-based query use cases?

David

julianhyde commented 2 years ago

I agree that these kinds of queries are important. (I tend to call them 'iterative queries' but others call them 'recursive queries', and 'incremental queries' are closely related.) A functional programming language that treats relations as data values would seem to be a natural approach to such problems.

In a development branch (commit f99d4a999fd51f71b7b71da2afd601543016635f) I have added the Relational.iterate function, which runs a computation (query) until it reaches a fixed point. Like this:

Relational.iterate
  (from e in emp where e.mgr = 0)
  fn (oldList, newList) =>
      (from d in newList,
          e in emp
      where e.mgr = d.empno
      yield e);

Such queries can run via Calcite, being translated to Calcite's RepeatUnion relational operator.

I am also experimenting with writing such queries as recursive functions (you can see some examples in that commit). The tricky part is for the language to know that when I call a function recursively and use its return in a union, that has a chance of reaching a fixed point, and therefore the recursion will not go on forever. That would need us to give union some special properties, and I'm not sure what those properties are, or whether they belong to union alone.

DavidPratten commented 2 years ago

Thanks, I am curious about where this will land.

I’m looking for a language that bring expressiveness and simplicity to handling relational data tables that express transitive relations that form trees and directed acyclic graphs.

Why aren’t transitive relations first class citizens in the relational model?

To your question about termination. I’m curious if there is a generalisation of relational algebra and well behaved operators that would be available if the language enforced acyclic or tree constraints over the input data. One simplistic embodiment of this idea would be a fkey-acyclic constraint, or fkey-tree constraint in self-referential transitive relations expressed as tables. Alternatively, the language could require calling out of the self-referential fkey/primary key and enforce the tree and/or acyclic structure constraint at run time.

Do implicit union types play a role here. An early slide in the presentation contains an example of tree algorithms in native ML. I was waiting for this to be folded into Morel. What would it look like if ML, provided the foundations for expressing recursive / iterative queries rather than Calcite?

David