Rules for language composition and extension

EliasC commented 3 years ago

In yesterday's meeting we discussed the issue of overriding cases in semantic functions. Consider the following program:

lang L1
  sem f =
  | 0 => "zero"
  | _ => "not zero"
end

lang L2 = L1
  sem f =
  | 0 => "naught"
end

In our current implementation, this fails because it's not possible to decide which of the 0 patterns that are most specific (i.e. we can't decide which of the branches to pick if L2.f is passed 0). Arguably, there are cases where it is useful to override patterns -- but it is also, to paraphrase @elegios, a potential foot canon. This issue is for discussing the semantics of language composition so that we can get closer to a design that is both sound and useful.

In @david-broman's "A Vision of Miking" paper, there is an abstract description of language composition which distinguishes between language union L1 U L2 and language extension L1 <| L2. Intuitively, the union between two languages combines the constructors of datatypes and cases of semantic functions so that the resulting language can use constructs from them both. Importantly, the two languages in a union are disjoint, meaning there is no overlap in constructors or cases. In a language extension L1 <| L2, L2 knows about the constructors and functions in L1 and can extend it, e.g. via product extension of constructors or by overriding semantic functions. A good place to start this discussion is the properties of these two operations:

U is associative, meaning that (L1 U L2) U L3 = L1 U (L2 U L3). Note that if any two of the languages are not disjoint the composition is malformed in both cases.
U is commutative, meaning that L1 U L2 = L2 U L1.
Because the semantics of <| is not as clear, it is harder to say if it is associative. It is definitely not commutative, as in L1 <| L2 L2 knows about L1 but not the other way around. I will leave the question of associativity open for now.
Intuitively, we should have a distributivity law stating that (L1 <| L3) U L2 = (L1 U L2) <| L3 whenever L3 does not mention any constructors or cases in L2 (on the lefthand side this is trivially true, but not on the righthand side).

In our implementation we have + which works like U above. The extension operator is implicit in that when we write

lang L3 = L1 + L2
  sem foo
  | Bar x => x
end

we can think of it as "inlining" the extension as lang L3 = (L1 + L2) <| { sem foo ... }. Going back to our initial example then, the definition of L2 should be fine as we are extending L1 with the new case for the function f. However, the following composition should still fail:

lang L1
  sem f =
  | 0 => "zero"
  | _ => "not zero"
end

lang L2
  sem f =
  | 0 => "naught"
end

lang L3 = L1 + L2 -- Error: Overlapping cases for semantic function f

The same reasoning lets us avoid the "diamond of death" from languages with multiple inheritance:

lang L1
  sem f =
  | _ => "a number"
end

lang L2 = L1
  sem f =
  | 0 => "zero"
end

lang L3 = L1
  sem f =
  | 0 => "naught"
end

lang L4 = L2 + L3 -- Error: Overlapping cases for semantic function f

The situation is not as clear if only one of L2 and L3 overrides a case:

lang L1
  sem f =
  | _ => "a number"
end

lang L2 = L1
  sem g =
  | x => x
end

lang L3 = L1
  sem f =
  | 0 => "naught"
end

lang L4 = L2 + L3 -- Error?

Here L2 "inherits" the implementation of f from L1 without changing it. The question is then if the union L2 + L3 should be disallowed because they have different semantics for f 0, or if it should be allowed and understood as the override from L2 taking precedence because both versions of f have the same provenance. My personal preference is for the former case, but I can see a point in the latter as well.

Finally, I want to mention that @elegios argued for having an explicit syntax for overriding cases to avoid accidentally overriding the case of an included language.

dlunde commented 3 years ago

Thanks Elias!

Comments

In a language extension L1 <| L2, L2 knows about the constructors and functions in L1 and can extend it, e.g. via product extension of constructors or by overriding semantic functions.

So, this issue is really about if we should restrict the above definition of language extension or not? Because, as it is written now, it sounds like exactly what I need. At the moment, I believe language extension is really (correct me if I'm wrong) the same as language union in the implementation. That is,

lang L3 = L1 + L2
  sem foo
  | Bar x => x
end

is really lang L3 = L1 + L2 + { sem foo ... }, which is why I'm not able to override things (see example under next heading).

The situation is not as clear if only one of L1 and L2 overrides a case:

I guess you meant "only one of L2 and L3"?

Motivating use case

The reason for why I brought up this topic is the following simple use case: I have implemented an ANF transformation over MExpr terms, which uses a semantic function isValue internally to decide which terms should be lifted out of the expression in which they originate. Such a transformation is also really useful in my MExpr -> C compiler, because I can then lift data constructor applications, record constructions, and sequence constructions (which are not allowed to appear anonymously within expressions in C). However, I do not want to lift out standard applications from expressions (nested function/operator application is of course allowed in C), as is the default in the ANF implementation. A super clean and simple way of doing this would be to override the case for TmApp _ in isValue in MExprANF

lang MExprANF

  sem isValue =
  | TmApp _ -> false

  ... other stuff ...

end

with

lang MExprCCompile = MExprANF

  sem isValue =
  | TmApp _ -> true

  ... other stuff ...

end

in MExprCCompile.

(The above is a bit simplified compared to the actual implementation which composes many language fragments to form MExprANF)

EliasC commented 3 years ago

@dlunde: I agree that this sounds like a fit for your use-case.

At the moment, I believe language extension is really (correct me if I'm wrong) the same as language union in the implementation.

Yes, that is exactly right!

I guess you meant "only one of L2 and L3"?

Oops! Edited...

miking-lang / miking

Rules for language composition and extension #218

Comments

Motivating use case