melt-umn / silver

An attribute grammar-based programming language for composable language extensions
http://melt.cs.umn.edu/silver/
GNU Lesser General Public License v3.0
57 stars 7 forks source link

First-class instance types and context attributes #712

Open krame505 opened 1 year ago

krame505 commented 1 year ago

Writing up my thoughts on what I was proposing a bit more concretely. I realize this issue has become the size of a blog post, but I hope it is easy to follow anyway.

Motivation

Consider a library of polymorphic concrete syntax utility nonterminals, as proposed in #71. To use these in e.g. ableC, one would like to use the ast and unparse attributes. One approach is to use occurs-on constraints on the production:

nonterminal ZeroOrMore<item a> with unparse, ast<[a]>;
concrete production consZeroOrMore
attribute unparse {} occurs on item,
attribute ast<[a]> {} occurs on item =>
top::ZeroOrMore<item a> ::= h::item t::ZeroOrMore<item a>
{
  top.unparse = h.unparse ++ t.unparse;
  top.ast = h.ast :: t.ast;
}
concrete production nilZeroOrMore
top::ZeroOrMore<item a> ::=
{
  top.unparse = "";
  top.ast = [];
}

nonterminal Stmt with unparse, ast<Stmt>;
concrete production compoundStmt_c
top::Stmt ::= '{' l::ZeroOrMore<Stmt_c Stmt> '}'
{
  top.unparse = s"{${l.unparse}}";
  top.ast = compoundStmt(foldStmt(l.ast));
}

However there are a few issues with this:

Another motivation: equality attributes on polymorphic nonterminals require an Eq constraint on the polymorphic child. This currently means the production needs to have an Eq constraint. But we might not need the Eq constraint for other attributes on the nonterminal, or sometimes want to add an equality attribute in an extension.

First-class instance types

The solution I'm proposing for this involves first-class instances as types. Essentially, everything that we currently write as a type constraint would become a type expression that can be written anywhere that we use types. The value of an instance type would be the instance dictionary or whatever value is currently being passed automatically when the instance is resolved. For example Eq Integer would be a type whose values are the instance object containing the eq implementation to use for integers, and attribute unparse {} occurs on Stmt would be a type, whose value is the index of the attribute occurrence for unparse on Stmt.

Note that there is an unfortunate parse ambiguity with regular type class constraints, if they are to become type expressions. Consider parsing Maybe<Eq String>: we would like to parse this as Maybe<(Eq String)>, not Maybe<(Eq) (String)>. I think the fix for this is to change the type class syntax to use angle brackets (e.g. Eq<String> instead of Eq String), which is another annoying refactor but shouldn't be nearly as bad as removing autocopy.

There would be a new expression for a hole of some instance type, whose value should be computed by instance resolution. I'm not too sure about the syntax for this (_ is taken by partial application, so maybe hole or just ?). At inference time ? would have type a, which would must be resolved to some instance type. E.g. let eqInt :: Eq Integer = ? in ... end;

We would also need syntax for explicitly using some instance value in an expression. This is analogous to explicitly passing an implicit argument in Idris, for example. However as Silver functions aren't curried and we have non-function expressions (e.g. attribute access) that perform instance resolution, this should probably be a new expression using <instance> in <body> end where <instance> is an expression that must have some instance type, and within <body> the context corresponding to the type of <instance> would be available. For example, this would let one write

function getAst
a ::= astInst::attribute ast<a> {} occurs on b  x::b
{
  return using astInst in x.ast end;
}

global s_c::Stmt_c = ...';
global s::Stmt = getAst(?, s_c);

Note to self: anywhere that something resolves using a context introduced in this way, we need to emit a flow dependency on the instance expression.

Fun aside: named instances

Another cool feature that would be easy to add (and helpful for testing) would be named instances. I think @remexre was proposing something like this at one point. This would provide a neat solution to cases where there is more than one valid instance for a type. We could write things like

instance Semiring<a> => addSemigroup::Semigroup<a> {
  append = add;
}
instance Semiring<a> => mulSemigroup::Semigroup<a> {
  append = mul;
}

instance Semiring<a> => addition::Monoid<a> using addSemigroup {
  mempty = zero;
}
instance Semiring<a> => multiplication::Monoid<a> using mulSemigroup {
  mempty = one;
}

global sumRes::Integer = using addition in concat([1, 2, 3]) end;
global productRes::Integer = using multiplication in concat([1, 2, 3]) end;

Here writing a named instance would define a value of that instance type, rather than adding a normal instance definition. The using declarations on the Monoid instances are needed to specify how to resolve the Semigroup superclass of Monoid. We could write similar instances for conjunction/disjunction, too.

In theory this might also point towards supporting multiple occurences of the same attribute on the same nonterminal, potentially with different flow types. But that would create some issues with the flow analysis, and probably isn't worthwhile.

Context attributes

The intuition behind context attributes is fairly simple: synthesized attributes are "sorta" functions, inherited attributes are "sorta" function parameters, type constraints are essentially just implicit extra function paramters passing instance dictionary objects, so context attributes are just extra inherited attributes passing instance objects. Any instance attributes occuring on a production would have their types added to the type class instance/occurs-on environment for the production body. For example we can now write:

context attribute unparseInst<item>::attribute unparse {} occurs on item;
context attribute astInst<item a>::attribute ast<a> {} occurs on item;

nonterminal ZeroOrMore<item a> with unparseInst<item>, unparse, astInst<item a>, ast<[a]>;
concrete production consZeroOrMore
top::ZeroOrMore<item a> ::= h::item t::ZeroOrMore<item a>
{
  top.unparse = h.unparse ++ t.unparse;
  top.ast = h.ast :: t.ast;
  t.unparseInst = ?;  -- Or t.unparseInst = top.unparseInst;
  t.astInst = ?;      -- Or t.astInst = top.astInst;
}
concrete production nilZeroOrMore
top::ZeroOrMore<item a> ::=
{
  top.unparse = "";
  top.ast = [];
}

nonterminal Stmt with unparse, ast<Stmt>;
concrete production compoundStmt_c
top::Stmt ::= '{' l::ZeroOrMore<Stmt_c Stmt> '}'  -- Or '{' l::Stmt_c* '}'
{
  top.unparse = s"{${l.unparse}}";
  top.ast = compoundStmt(foldStmt(l.ast));
  l.unparseInst = ?;
  l.astInst = ?;
}

The infered flow type of unparse on ZeroOrMore would now be {unparseInst}, and the flow type of ast on ZeroOrMore would be {astInst}. If one of these equations was omitted in the compoundStmt_c production, a flow error would result. If unparse didn't occur on Stmt, a type error would be raised on the ? in the unparseInst equation of compoundStmt_c. This isn't a perfect example since both attributes have empty flow types, but if we instead had context attribute astInst<item a>::attribute ast<a> {env} occurs on item;, then the inferred flow type for ast would be {astInst, env}.

I also considered that perhaps these inherited equations for context attributes should be added implicitly to every decoration site whenever the context can be resolved. This behavior would be more in line with type class constraints, which are always solved implicitly; doing so would slightly improve the ergonomics of the above, since writing the inherited equations on l is a bit of an annoyance. However, an unsolvable context (e.g. unparse not occuring on Stmt) would result in an unhelpful missing equation error at the use site (e.g. the access of unparse on l) instead of a nicer type error on the inherited equation. There would be a number of implementation issues as well - a context attribute occurence would need to generate implicit aspect equations for all decoration sites of the nonterminal, including ones in other non-imported grammars. There would be issues with orphaned equations similar to autocopy as well, but this wouldn't be possible to resolve at runtime.

We could make the propagate behavior of context attributes just generate hole equations on all children with the attribute, so we could just write propagate unparseInst, astInst; in every production that makes use of these combinator nonterminals. That isn't so bad, I guess.

remexre commented 9 months ago

:warning: for named instances: if we have them, we probably also want some notion of parameterizing a type on a value of instance type (!). This is spooky, but we should be able to restrict the terms we allow in types so we don't run into higher-order unification. (I'd say this section is "noting a possible solution," not "advocating for that solution.") I think we need to support more than plain first-order unification, so this might be yet another thing that ends up equivalent to type families...

One of the motivating examples for both ML modules and typeclasses is the "Set problem" -- let's say you have a Set type that represents its contents as a sorted list, or a sorted tree, or a hashtable. (Any of these representations have the same issue.) If you require the user to pass in the compare, equal, or hash functions to each function, it's easy for a programmer to accidentally pass the wrong one in one spot, resulting in incorrect behavior. One could solve this by requiring that compare/equal/hash are provided when the set is created. The problem then arises with the union operator -- you'll need a function comparison at runtime to ensure that the functions match; if they don't, you'll have a bug that didn't get caught by the compiler.

Typeclasses solve this by ensuring uniqueness of instances. If we add named instances, we'd be breaking that property.

ML modules solve this by making a set of strings using the usual operators be a different type from one using e.g. case-insensitive comparison, by virtue of abstract data types. If the programmer forgets to make the representation of the type abstract, these two set types will unify, which is typically a bug.

One way you might do this in Coq, which is also the way I'm noting here, is that you'd parameterize the type itself with an implicit argument whose type is an instance type. When you write something like Set String, the instance search would occur then, to infer something like Set String stringOrd. The user could also write something like @Set String caseInsensitiveOrd to explicitly ask for a set type using that function, and a set of each type would not be unionable against the other.

This does require the good design sense to use the function as a type argument rather than as a field, but designing a generic type is less common than using one, so from an ergonomics perspective, this seems strictly better than not tracking this at the type level.

The real problem is that we need some way of getting a superclass instance out of a child one. For example, if Set is parameterized by caseInsensitiveOrd, there should be a way of getting a caseInsensitiveEq instance out of it. Doing this at the term level is easy enough, but if we support this at the type level, we need to be able to derive ordToEq(caseInsensitiveOrd) ~ caseInsensitiveEq... This feels equivalent to type families in implementation complexity, at least in that our substitutions aren't just from variables any more, and the order of solving constraints matters.

(Ed Kmett notes that this issue crops up with ML modules as well, and that avoiding it is an advantage of typeclasses.)


I think this is enough that my position on named instances is "don't bother having them" at this point... First-class instance types are probably fine, as long as we maintain the property that there's exactly one inhabitant of each instance type.