hydromatic / morel

Standard ML interpreter, with relational extensions, implemented in Java
Apache License 2.0
294 stars 15 forks source link

`from` should not have a singleton record type unless it ends with a singleton record `yield` #159

Closed julianhyde closed 2 years ago

julianhyde commented 2 years ago

If a from expression has one variable named i of type int, should the type of the returned elements be int or {i: int}? (We call int a scalar, {i: int} a singleton record type, yield {i = i} a singleton record yield, and yield {i = j, j = k} a renaming yield.)

After this change, from will have a singleton record type only it ends with a singleton record yield. If it does not end in yield, the type depends on N, the number of pipeline variables, and will be a scalar if N = 1 and a record with N fields if N != 1.

Before this change, that would depend on whether there was a singleton record yield (such as yield {i} or yield {i = i} or yield {i = j} or yield {i = j + 2}) somewhere in the pipeline, as follows:

- from i in [1,2];                        # 1
val it = [1,2] : int list
- from i in [1,2] yield {i};              # 2
val it = [{i=1},{i=2}] : {i:int} list
- from i in [1,2] yield i;                # 3
val it = [1,2] : int list
- from i in [1,2] yield i + 3;            # 4
val it = [4,5] : int list
- from i in [1,2] where i > 1;            # 5
val it = [2] : int list
from i in [1,2] yield {i} where i > 1;    # 6
val it = [{i=2}] : {i:int} list
- from i in [1,2] where i > 1 yield {i};  # 7
val it = [{i=2}] : {i:int} list
- from i in [1,2] yield {j = i} where j > 1; # 8
val it = [{i=2}] : {i:int} list
- from i in [1,2] order i desc;           # 9
val it = [2,1] : int list
- from i in [1,2] yield {j=i} order j desc; # 10
val it = [{j=2},{j=1}] : {j:int} list
- from i in [1,2] yield {j=i} order j desc yield {j}; # 11
val it = [{j=2},{j=1}] : {j:int} list
- from i in [1,2] yield {j=i} order j desc yield j; # 12
val it = [2,1] : int list

This behavior is unsatisfactory. It requires that a pipeline remember whether there is a singleton record yield somewhere in the pipeline.

After this change, the only thing that counts is whether the last step is a yield. To return singleton records, the last step must be a singleton yield, for example yield {i = i} (or the shorthand yield {i}), or a rename yield {j = i}, or an expression yield {k = i + j + 3}.

If the last step is not a yield, the result is scalar if there is one variable, a record otherwise. Here are the above expressions after the change:

- from i in [1,2];                        # 1
val it = [1,2] : int list
- from i in [1,2] yield {i};              # 2
val it = [{i=1},{i=2}] : {i:int} list
- from i in [1,2] yield i;                # 3
val it = [1,2] : int list
- from i in [1,2] yield i + 3;            # 4
val it = [4,5] : int list
- from i in [1,2] where i > 1;            # 5
val it = [2] : int list
- from i in [1,2] yield {i} where i > 1;  # 6 (changed)
val it = [2] : int list
- from i in [1,2] where i > 1 yield {i};  # 7
val it = [{i=2}] : {i:int} list
- from i in [1,2] yield {j = i} where j > 1; # 8 (changed)
val it = [2] : int list
- from i in [1,2] order i desc;           # 9
val it = [2,1] : int list
- from i in [1,2] yield {j=i} order j desc; # 10 (changed)
val it = [2,1] : int list
- from i in [1,2] yield {j=i} order j desc yield {j}; # 11
val it = [{j=2},{j=1}] : {j:int} list
- from i in [1,2] yield {j=i} order j desc yield j; # 12
val it = [2,1] : int list

The pipelines whose types have changed (6, 8, 10) are those that contain a yield but do not end in yield.

As example 8 shows, you can now use a singleton yield (yield {j = i}) to rename the variable without forcing the result to be a record type. You would not want to use a singleton yield as the last step, because there are no downstream steps to use the new variable name.

As part of this change, we introduce a new class FromBuilder to safely build Core.From pipelines. It performs micro-optimizations as it goes, such as removing where true steps. FromBuilder can also inline nested from expressions:

from i in (from j in [1, 2, 3]
    where j > 1)
where i < 3

becomes

from j in [1, 2, 3]
where j > 1
yield {i = j}
where i < 3

Note the use of yield {i = j} to handle the variable name change caused by inlining.