Make type variables scoped

xsebek commented 1 month ago

Is your feature request related to a problem? Please describe.

When writing a complex function, it is helpful to use local type annotations. But this breaks when I use local variables:

def id : forall x. x -> x = \x.
 let f : x = x in f  // From context, expected `x` to have type `s2`, but it actually has type `s1`
end

This is the same thing as Haskell without ScopedTypeVariables:

idx :: forall x. x -> x
idx x = let y :: x
            y = x
        in y  -- Couldn't match expected type ‘x1’ with actual type ‘x’ ...

Describe the solution you'd like

I would like to have scoped type variables by default.

Describe alternatives you've considered

It's possible that this would cause problems for later definitions if they were technically inside the scope. An alternative would be using a type annotation inside the body, like the Haskell example:

f :: [a] -> [a]
f (xs :: [aa]) = xs ++ ys
  where
    ys :: [aa]
    ys = reverse xs

Additional context

I discovered this issue while writing #2161. The only workaround is to not specify the local definition signature or pass parameters with the local types.

byorgey commented 1 month ago

This seems like a reasonable idea. Off the top of my head, to implement this I guess we should do something like:

Keep track of an additional context while typechecking, to remember which type variable names are in scope. (Edited to add: I think this also needs to map in-scope type variable names to the skolem variables that were generated for them.)
Stick type variables into this context when checking a let expression (which includes def as syntax sugar) with a type annotation.
- Note that in Haskell, ScopedTypeVariables is only triggered with an explicit use of forall; if we wanted that it would require reworking our AST + parser a bit, since at the moment parsing a -> a and forall a. a -> a both yield exactly the same AST.
When checking anything else with a polymorphic type signature, don't implicitly generalize over any type variables which are already in scope. (But explicit generalization, as in forall a. a -> a would shadow in-scope type variables.)

In Haskell, of course (1) ScopedTypeVariables is off by default, (2) even when turned on it is only triggered by use of explicit forall, and (3) these days I get the sense that it is generally looked down upon in favor of more newfangled approaches with explicit type application and so on. However, I don't know of any good reasons for (1) and (2), and (3) doesn't really apply in our case since we don't compile down to some kind of polymorphic lambda calculus with explicit type passing like Haskell does. In addition, in our case type signatures always have to go right next to definitions; in Haskell they can be far apart which complicates the issue.

See also #2102 which wouldn't solve the issue but is relevant here.

xsebek commented 1 month ago

The problem I run into is the following limitation of records:

In the record projection rn.x, can't infer whether the LHS has a record type. Try adding a type annotation.

So I have to provide a type annotation, but I also can't write it:

tydef RBTree k v = rec b. Unit + [c: Bool, k: k, v: v, l: b, r: b] end
tydef RBNode k v = rec n. [c: Bool, k: k, v: v, l: Unit + n, r: Unit + n] end

def balance : RBNode k v -> RBTree k v = \t.
  let baseCase /* : RBTree k v */ = inr t in
  let balanceRR /* : RBNode k v -> RBTree k v */ = \rn.
    if true {baseCase} {undefined rn.x}  // can't infer r.x
  in undefined
end

@byorgey the only workaround I found was passing the t original parameter around so that the types would match. This effectively prevents me from using any local variables that are records.

xsebek commented 1 month ago

@byorgey btw. ScopedTypeVariables is on by default in GHC2024 and GHC2021. 🙂

I think it is a very beginner-friendly extension that "does what I meant". 😄

byorgey commented 1 month ago

The problem I run into is the following limitation of records:

That is a very good motivating use case!

@byorgey btw. ScopedTypeVariables is on by default in GHC2024 and GHC2021. 🙂

I didn't know that! That makes me feel better, that there aren't lurking downsides we're not thinking of.

byorgey commented 1 month ago

So here's what I'm thinking in terms of a detailed design. Comments welcome!

Any polymorphic type, whether written with an explicit forall or not, brings all its variables into scope in the body of its definition.
Any type variables declared with an explicit forall are introduced as new variables, and in particular they will shadow any type variables already in scope with the same names.
Any type variables not bound by an explicit forall which are not already in scope will be implicitly quantified. Any such type variable names which are in scope will simply refer to the name in the enclosing scope.
Currently, in terms of explicit forall we require an "all or nothing" approach --- if you use an explicit forall, then you must list all the type variables in the type, or else you get an error. But given the above about shadowing etc., I think it makes more sense to get rid of this restriction. Variables listed in an explicit forall are bound fresh; variables not listed will either be implicitly quantified (if they are not already in scope) or will just refer to the enclosing scope (if they are).
- I doubt any of this will make much difference to most people programming in swarm-lang.

So, for example, this will work:

def balance : RBNode k v -> RBTree k v = \t.
  let baseCase : RBTree k v = inr t in
  let balanceRR : RBNode k v -> RBTree k v = ...
  in undefined
end

The type signature on balance will bring k and v into scope in its body (even though there is not an explicit forall written); then the k and v in the type signatures on baseCase and balanceRR will both refer to the k and v bound in the type of balance. In particular, this means that inr t is a valid definition for baseCase since their types match. On the other hand, if one were to write

def balance : RBNode k v -> RBTree k v = \t.
  let baseCase : forall k v. RBTree k v = inr t in
  undefined
end

then the k and v in the type of baseCase would be new type variables which are different than the k and v in the type of balance (which would now be shadowed within the definition of baseCase). This would now be a type error (which is what happens currently), since baseCase is required to be an RBTree k v for any types k and v, but inr t only has the specific type that was given as an input to balance.

Finally, if one wrote

def balance : RBNode k v -> RBTree k v = \t.
  let baseCase : forall k. RBTree k v = inr t in
  undefined
end

then only k would be a new type variable, whereas v would refer to the v from the type of balance. (This would still be a type error, of course.)

A plan for actually implementing this:

Currently we do the work of implicit quantification in the parser; we would need to change this to simply parse what was written, and do the implicit quantification during typechecking.
Add a new Ctx Var to the typechecker which maps type variable names to the skolem variable names chosen for them. This is used to track not only which variables are in scope, but also which skolem variable name corresponds to each.
When checking each polymorphic type, we implicitly quantify according to the above specification, and generate new skolem variables only for type variables which are explicitly or implicitly quantified; for in-scope variables we just look up the corresponding skolem variable in the context.

xsebek commented 1 month ago

@byorgey Thanks for the detailed breakdown. 👍

Adding type annotations for local functions is so rare that I would not even mind having to always write explicit forall. 🙂

byorgey commented 1 month ago

I would not even mind having to always write explicit forall.

Fair enough. Requiring explicit forall wouldn't really make the implementation any easier, though (actually it might be the opposite), so unless you know of good reasons that we should require explicit forall like Haskell does, let's not worry about it.

I have an implementation which is almost working, just a bug or two to track down yet...

swarm-game / swarm

Make type variables scoped #2168