FR: Saturated constructors

Description

This is a feature request.

Change how we handle constructors to require them to always be fully saturated, unlike functions. As part of this change, they will become checkable, rather than synthesisable, and will not carry the indices of their datatype -- we will have Maybe Bool ∋ Just True rather than Just @Bool True ∈ Maybe Bool. This will affect both the internals and the API.

Motivation

Currently there is not much symmetry between introduction and eliminations of ADTs. Patterns must be fully saturated, but constructors might not be: case (map Just [True, False]) of [] -> ... ; (x:xs) -> ...; has Just being undersaturated, but both [] and x:xs are fully saturated (here I am ignoring polymorphism/type applications -- just pretend these types are monomorphic). When we represent terms as trees in the API, it is weird to have application nodes in patterns, as we are not really matching on "application" -- indeed, one cannot write a pattern like case (not $ True) of f $ x -> x (which one may imagine was expressible and would evaluate to True). Similarly, when constructing a polymorphic type, one must give the index, but when (deep) matching one cannot match on it, and it is not clear what the syntax/semantics should be here (other than eliding it from the match): case (C @a t) of C @_ True -> e1; C @_ x -> e2 is fine (when Bool ∋ t), but having @_ is odd (one can't give it a fresh name (without introducing a type equality); primer could automatically write the correct, concrete, index there, but that is odd as one cannot manually specify a pattern there contrary to True and x). There are also no "pattern actions" (and will not be, even when/if we had a rich notion of "pattern actions") applicable to these nodes.

Note that this fully-saturated-ness means that some terms will become much more concise. For example, writing the list [1,2,3] out explicitly today is Cons @Int 1 (Cons @Int 2 (Cons @Int 3 (Nil @Int))), but would become Cons 1 (Cons 2 (Cons 3 Nil)). Similarly the nested pairs (('a',1),(True, Λa.λx.x)) would today be MkPair @(Pair Char Int) @(Pair Bool (Λa. a -> a)) (MkPair @Char @Int 'a' 1) (MkPair @Bool @(Λa.a->a) True (Λa.λx.x)), but would become MkPair (MkPair 'a' 1) (MkPair True (Λa.λx.x))

Dependencies

Spec

Constructors will now be always fully-saturated, and only checkable, not synthesisable (although, see "Spec questions"). This will affect both the core language, typechecker and evaluator, as well as the API and actions. Instead of having (roughly)

data Expr =
  | Con ValConName
  | App Expr Expr
  | APP Expr Type
  | ...

where Just True is represented as (Con "Just" `APP` TCon "Bool") `App` Con "True" we would have

data Expr =
  | Con ValConName [Expr]
  | App Expr Expr
  | APP Expr Type
  | ...

where Just True is represented as Con "Just" [Con "True" []]. The typing rule would change from

data T a = C A[a] | ... is defined
----------------------------------
Con C ∈ ∀a. A[a] -> T a

data T a = C A[a] | ... is defined
A[S] ∋ t
----------------------------------
T S ∋ C [t]

(Note that there is only one rule per constructor, applying when that constructor is fully saturated) We can also relax the stipulation that constructor names are globally unique, if desired. (However, see "Spec questions")

Evaluation does not need to change, other than to adapt to the changed Expr definition (since all eliminations of ADTs are already fully-saturated, and the typechecker will rule out any Apps with a constructor in function position -- this won't be guaranteed by Haskell's type system, so to handle this "impossible" path we could just throw an error; I don't suggest appending to the list, since that wouldn't work for APP nodes and may delay detection of bugs.)

The API will change to reflect this core change. The "raw"/"rich" api will have the simple reflection. The openapi will change how it outputs trees -- the Tree type will not change, but we will now emit different trees:

Tree {flavor=App,body=NoBody,childTrees=[ Tree {flavor=APP
                                               ,body=NoBody
                                               ,childTrees=[
                                                    Tree {flavor=Con,body="Just",childTrees=[]}
                                                  , Tree {flavor=TCon,body="Bool",childTrees=[]}
                                                  ]}
                                        , Tree {flavor=Con,body="True",childTrees=[]}]}`

will now become

Tree {flavor=Con,body="Just",childTrees=[Tree {flavor=Con,body="True",childTrees=[]}]}

Actions will change since some actions no longer make sense. The "insert a constructor" and "insert a refined constructor" actions will be removed, but the "insert a saturated constructor" one will stay (perhaps being renamed). Possibly we could not offer the constructor action in a hole with function type, or maybe have a fancy "insert an eta-expanded constructor" (see Spec questions). We should possibly not offer the x ~> x $ ? action on a constructor (since it would only ever do C s t ~> {? (C s t) $ ? ?}).

Spec questions

This FR calls for constructors being checkable (only), and fully-saturated. There is an argument that this would be bad UX (note that some of the implications are discussed below), in which case we have three options

do not implement this FR
implement this FR anyway, if this poor UX is overwhelmed by better UX and pedagogy due to the enhanced symmetry between constructions and eliminations/patterns.
modify this FR to improve matters

To attempt (3), there are two things to try. (Though, to be clear, I am not convinced whether we are in world (3), or whether these suggestions improve matters.)

Potentially (it seems plausible, but I haven't thought hard about it) that we could make fully-applied constructors synthesisable, assuming that they are uniquely named. However, we would not be able to (without type inference, and even then not in all cases) infer the indices/parameters, but we could synthesise a hole-y type: Just ? ∈ Maybe ? and Just True ∈ Maybe ? are easy (note that the second could be improved with inference, but the first could not be).

To lessen the burden of not having partially-applied constructors, we could generate/sugar (Coq-style) a function for each constructor which does the "eta-expansion": Cons x xs being the pedantic constructor which must be saturated and is only checkable, and cons : ∀a. a → List a → List a being a normal top-level definition, defined as cons = Λa. λx. λxs. Cons x xs which can be partially applied and is synthesisable. Notice that cons is polymorphic (and can/must be applied to a type argument). However, it does seem plausible that it would lead to more confusion.

Implementation details

Since this directly touches the core Expr type, it is difficult to see how to break the work into small pieces. I suspect that it is possible to mostly decouple the OpenAPI changes from the core changes. We could either modify the API first (have a special case that detects constructor-headed spines -- this would not ensure fully-saturated-ness, but this is only a temporary stopping point), or last (render Cons x xs as a tree headed by APP -- we should have all the type information available from the TC, and can always put holes in the types if needed.) Whilst this decoupling is fairly pointless on its own (the OpenAPI changes are small), it does enable us to split the core changes into finer pieces without much externally-visible API churn. In the core, we could split into steps:

Constructors can have (type and term) arguments, but not needed to be fully saturated, and still synthesisable Expr = ... | Con Name [Type] [Expr], where Con `C` A x is equivalent to the tree (C @ A) $ x. Perhaps this can be added under a new name, with a pattern synonym for the old form, and this may help split the testsuite changes (although this is speculative).
Enforce constructors being fully-saturated (but still synthesisable) -- this is mostly a TC change, but will have knock-on effects on documentation and the testsuite. It will also require the "insert a bare constructor" action to be removed -- (possibly the action change can happen up front, before the core is changed?).
Enforce constructors being checkable, but still have type arguments. This is mostly a TC and Eval change, with knock-on effects on the testsuite and documentation. It will also require the "insert a saturated constructor" action to be offered in fewer places (although this too could happen up front).
Drop the type arguments Expr = ... | Con Name [Expr]. This should be a simple change, but with a lot of churn.

Not in spec

This spec does not cover any "fancy" features that are not a part of current primer. At first glance, I don't foresee any problematic interactions with future extensions. However, I wanted to point out one feature of this current spec is that constructors and (symmetrically) patterns will never contain any type( variable)s. This would not be the case if we had existentials/GADTs (i.e. constructors which contain type arguments which are not parameters of the constructor's datatype). For example, Haskell's

data WrappedAny where
   Wrap :: forall a . a -> WrappedAny

could be translated by

TYPE ∋ A 
A ∋ a
---------------------
WrappedAny ∋ Wrap A a

e ∈ WrappedAny
α:*, a:α ⊢ T ∋ t
-------------------
T ∋ match e with
      Wrap α a -> t

Discussion

This change will have implications for what programs are accepted.

The fact that constructors are only checkable means that one can put them in fewer positions:

not in "function position", but that is also ruled out by fully-applied-ness
not directly bound by a let
not directly inside a non-empty hole (although, see #931)
not directly scrutinised by a match_with_ If it occurs in any of these places, it will need a type annotation.

The fact that constructors are only fully-applied means that they act less like functions. In particular, one has to eta-expand more often. The clearest example is when using map: one has to write map (λx.Just x) [True,False].

Future work

Thanks! This is very helpful. Let's discuss in the next developer meeting. One thing I'd like to understand better is how the "worse UX" of this proposal manifests. (I think you have pointed out at least some examples of this in the Discussion section.)

This is great. The motivation section explains the upsides better than any of us have managed before now.

The "insert a constructor" and "insert a refined constructor" actions will be removed, but the "insert a saturated constructor" one will stay (perhaps being renamed).

Naming the remaining action simply "insert a constructor" seems the obvious choice.

Possibly we could not offer the constructor action in a hole with function type

Wouldn't this be inconsistent with how we treat other actions? For example, we currently allow inserting a variable with non-function type here. Maybe it would be okay in beginner mode.

We should possibly not offer the x ~> x $ ? action on a constructor (since it would only ever do C s t ~> {? (C s t) $ ? ?}).

Agreed. Though again, I worry about this being inconsistent with other non-function expressions.

To lessen the burden of not having partially-applied constructors, we could generate/sugar (Coq-style) a function for each constructor which does the "eta-expansion"

My preference here would be, if anything, to have an action for generating these functions on the fly (this may be what you meant elsewhere by "insert an eta-expanded constructor"). This is essentially equivalent to your proposal but with the functions (which would be small) always inlined, avoiding pollution of the namespace.

This FR calls for constructors being checkable (only), and fully-saturated. There is an argument that this would be bad UX (note that some of the implications are discussed below)

The fact that constructors are only checkable means that one can put them in fewer positions

This is all a bit concerning. Like @dhess, I'll wait until our dev meeting for some clarifications here, as I don't currently fully understand all the trade-offs, including exactly how inference differs from synthesis.

Spitballing a little, perhaps we should require that constructors are also "saturated" with type arguments, if this is enough to recover synthesisability? This might also help with your point about extending to GADTs, though I haven't thought this through. It would, however, negate the section in the motivation about type applications in patterns being odd, and wanting to get rid of them. EDIT: I've gone off this idea. We'd also lose the fact that expressions like Cons 1 (Cons 2 (Cons 3 Nil)) get much simpler (which should maybe be in the motivation section). Besides, after today's meeting, I realise that the current situation with the occasional annotation needing to be inserted, at worst, isn't as bad as I feared.

(These are mostly notes to myself)

The current view on this FR is that it is worth implementing so we can have an actual artifact to inspect and decide if the UX tradeoffs are worth it. We expect that they will be, so this implementation should be with a view to merging.

When implementing I should

ensure that foo : ? = Just 5 is accepted (and interactively constructs as I expect)
then ensure that when refining the hole to Maybe Bool we get foo : ? = Just {? 5 ?}
consider how to make it easy to hack up the "synthesisable Just ? ∈ Maybe ? or Just True ∈ Maybe Bool" idea, to test its UX (which we could do before understanding all the consequences / its metatheory)

Some extensions/metatheory to consider

It would be sensible for a beginner to write (with a view to understanding how language constructs work)/an intermediate state in evaluation to be match (Just 5 : ?) with ... where it is not clear what annotation should be here. Having one at all seems somewhat silly!
In that vein, a light amount of inference may be a large improvement in UX, but the metatheory is not immediately clear. Some options are to have an action that "refines this type hole to the 'obvious' thing" (which possibly does not need to have a nice metatheory, since it is a user-driven action and can easily be undone or the result modified), or to actually do inference so the annotation does not need to appear in the program text (which may make for nicer programs and better UX in the happy case, but I worry about understandability when there is a type error and inference is involved)
The extension to remove indices from constructors but keep them synthesisable (courtesy of type holes) is interesting, but needs thinking about, especially in the extended form where we can have Just True ∈ Maybe Bool
One way to reduce the burden of fully-applied-ness generically is to have a "λ abstract this hole" action that could convert ... ? ... into λx. ... x .... This could be used to write map @A @B {?Just ? ?} t and then create map @A @B (λx.Just x) t. This action has an advantage over the (perhaps more obvious) "eta-expand this constructor": it is more general and widely useful; in particular it is easy to only expand some arguments, i.e. to write λxs. Cons 1 xs

Something that just occurred to me: could we ensure that the new API assigns IDs to pattern constructors, as we already do for pattern variables?

This would make things easier for the frontend, since then every node could be selectable (except for pattern boxes, but those don't really look like normal nodes anyway), and it would open the door for actions to be performed on them, such as re-ordering. This would have been less achievable if we kept pattern-application nodes.

I am 100% on board for that proposal. It's never been right that we can't select pattern constructors, even if we can't do anything with them.

I'd kind of like to find some sort of action that we could offer on them, just so that the action panel is never empty. But I can't think of anything useful.

This is getting a bit off-topic for this FR, but in lieu of a full-blown pattern language, which we've agreed should probably be a Primer 2.0 feature, a nice middle ground might be to support a single wildcard pattern. I can imagine it working something like this:

Select one or more constructors that you want to replace in a given match with expression (shift-click to add them one at a time, or maybe drag a selection box around them, if they're contiguous).
As soon as one or more are selected, offer a "Replace constructor with wildcard" action.

Something that just occurred to me: could we ensure that the new API assigns IDs to pattern constructors, as we already do for pattern variables?

I think this is orthogonal to this FR: I am not proposing to touch patterns at all. These changes mean actual constructions will visually look like patterns, but this achieved by modifying constructors-in-terms, and not touching pattern-constructors.

That said, I would like a pattern language -- at the very least we need the ability to select a finite number of alternatives and match the rest with a wild card to be able to match on primitive numbers. I merely think it should be done in its own FR. (I would be happy to work on that after this one is implemented, and have some thoughts on how it may work, or could leave this for someone else.)

I think this is orthogonal to this FR

Fair enough, I'll take your word for it. I had assumed that this would be a good time to do it, since we're already needing to modify the trees we output for patterns (to remove application nodes, which currently we also have to generate mock IDs for on the frontend).

I think this is orthogonal to this FR

Fair enough, I'll take your word for it. I had assumed that this would be a good time to do it, since we're already needing to modify the trees we output for patterns (to remove application nodes, which currently we also have to generate mock IDs for on the frontend).

Ah, I see what you are getting at now. To clarify a few points:

I am not touching patterns in the core (which do not have IDs), but am modifying constructors
In the (Open)API, I am changing the rendering of both constructors and patterns
We actually generate mock IDs for pattern application nodes, boxes and bind nodes in the backend: https://github.com/hackworthltd/primer/blob/main/primer/src/Primer/API.hs#L782
To make things actionable, we would need IDs on them in the core, and not just ephemerally in the API's output

Got you. Yes, I was talking about adding IDs to patterns in the core, and now that I think about it I understand that patterns aren't actually changing here.

For posterity, in this comment, I'm adding our notes from our 2023-02-15 developer meeting, during which we discussed this FR. (Some of these notes may already be incorporated into the FR proposal and/or other comments in the thread, but I don't want to miss anything, at the risk of being redundant.)

Potential UX issues:

Can we still insert value constructors piece-wise?
- yes
- is there a scenario where type applications would be required?
  - no
- is there a scenario where type annotations would be inserted by the type checker?
  - yes:
    - in scrutinee, match Just 5 would currently add : ? annotation
      - with a bit of extra inference we could change this to : Maybe Int
      - perhaps we should just elide it in the canvas?
        
        it’s still there, we just don’t display it, as it might not be helpful to the student
    - also for let bindings
    - holes e.g., {? Just ? : ? ?}
  - We may be able to avoid some of these scenarios if constructor names are unique:
    - if you have Just ? then the type checker knows this is of type Maybe ? because Just is unique to Maybe
    - @brprice needs to think about the theory on whether this is possible
map & partial application
- In Haskell: map Just [1, 2, 3] ⇒ [Just 1, Just 2, Just 3]
- In Primer (with this change);
  - map @Int @(Maybe Int) (\x.Just x) (Cons 1 (Cons 2 (Cons 3)))
    - i.e., we need to eta-expand the Just
    - @dhess is OK with this (@georgefst too)
      - It’s even a good use case for wrapping things with lambdas
        
        in Scheme, you typically are first introduced to wrapping as a way to make functions lazy
        
        but of course, we already have laziness, so we need another good way to introduce wrapping. This could be it.
      - In more advanced modes, once the student understands the concept and how to do it, we could auto-generate helper functions like just : a → Maybe a
    - We could also add an “eta-expand” action to generalize
      - e.g. an action Just ? -> λx. Just x (this isn't really eta-expansion)
        
        can be generalised to be available on any hole
        
        very powerful when combined with a "lift lambda" action
      - We could also add an action to take a lambda and make a definition out of it

This FR is implemented in #958, #959, #960, #961 and #962

Possibly we could not offer the constructor action in a hole with function type, or maybe have a fancy "insert an eta-expanded constructor" (see Spec questions).

This is not implemented. I have opened #975 for this idea

hackworthltd / primer