Move away from using Moniker for variable binding

brendanzab commented 5 years ago

Our autobinding library, Moniker is currently not pulling its weight in terms of making it easy to understand the internals of Pikelet and will make supporting a high performance compiler heading into the future. In the words of @kleimkuhler:

I know the times I’ve explored enough into the codebase to get an idea of how you have implemented something, I usually end up at a point where a binding occurs and I lose track a little.

Also from conversations on Gitter, @boomshroom and others have struggled with grasping Moniker, so it's not an isolated issue! @jonsterling has also noted on Twitter that he is not really a fan of using ABTs (Abstract binding trees) in implementations:

What I found is that unfortunately, it is difficult to see the right way to code a specific thing unless I have hand-coded the syntax. Some kind of ABT thing that abstracted over binding-sensitive traversals would be nice, but I concluded that the main point of ABTs, which was to abstract over binding itself (providing some interface with names, and automatically freshening etc., providing substitution) is actually harmful for implementations.

It's not only this - moniker is also standing in the way of using visitors in the compiler, and will cause us performance problems down the line as Pikelet codebases get larger. It also could make salsa/adapton style incrementalism harder.

Requirements for name binding

The problems we will have to tackle when moving to a new variable binding scheme is:

sound semantics
- capture-avoiding substitution (λx.e [e/x] should not result in λx.x)
- alpha equivalence (λx.x should be equivalent to λy.y)
- reduce the chance of messing up variable binding when adding new features
support existing and upcoming features
- produce nice, stable, pretty names when pretty printing
- recursive bindings (#46)
- support reflection (like in Idris, Agda, F*, Lean)
performance optimizations
- allow for stable names across incremental compilations (see adapton and salsa) (#103)
- reduce unnecessary tree traversals and variable shifts
- allow for vistor-based fusion (#75) resulting in reduced allocations and as close-to single-pass compilation as we can get

Possible solutions

We have a number of options open to us:

Nominal binding

Use nominal binding, like in David Christiansen's NBE tutorial - could be compatible with visitor-based futsion which might amortize some of the performance penalties. @pythonesque has expressed some concern over going down this route though, especially when it comes to recursive bindings.

Graph libraries

Use petgraph, like in Program Synthesis is Possible in Rust. Not sure how well this would support alpha equivalence though. This feels similar to the Scope Graph stuff from A Theory of Name Resolution. I'm not sure if this has ever been applied to dependently typed languages though, and it is off-the beaten track in terms of the main-line of research.

Locally Nameless

Continue to use a locally nameless approach as with moniker, but bring it into Pikelet. This might result in lots of traversals though, Lean has workarounds though. Also not compatible with visitors.

Explicit substitutions with a fully representation

Apparently locally nameless has not been proven stable under substitution(?) for delayed substitutions, so we'd have to go with a fully nameless representation here, like in autosubst. This is well understood, but might be harder to get to work with visitors. The advantage would be that it would be quite close to what we would be doing in a theorem prover if we ever wanted to do a soundness proof of Pikelet.

Use "semantic type checking"

This is what @jonsterling has advocated to me on Twitter:

I base my stuff on an algorithm that I think comes from Thierry Coquand, called "semantic type checking". The main idea is as follows:

Have a syntax based on De Bruijn indices. You don't even need to implement any operations on the syntax. This syntax will serve as the "unchecked" inputs to your judgments. (Like the M in G !- M : A.)

Have a semantic domain based on De Bruijn levels. Interpret binders as closures; environments are sequences of values.

In your bidirectional type checker, you have judgments like G !- M <= A and G !- M => A. in both cases, G and A are coming from the semantic domain, whereas M is syntax. In the mode-shift rule, you will check either definitional equivalence or subtyping (depending on the language), and this will be done structurally in the semantic domain -- this part has the structure of quotation from NbE, but let me observe that you actually don't ever need to quote anything. The algorithm has the same structure though.

Now, let me point out the epic Power Move that we executed so far.

What was important was the yoga of having indices in the syntax and levels in the semantics. It means that the parts of your judgment that have wellformedness presuppositions, which we already agreed to draw from the semantic domain, can be implicitly weakened, so there is never any need anywhere in the algorithm to unleash a De Bruijn shift, or any kind of operation on syntax. The only thing you ever do to syntax is check the head constructor, and evaluate it into the domain afterward.

I will have to think about this more in order to get my head around it!

Here is an example type checker that uses the technique: https://github.com/jozefg/nbe-for-mltt/

Improve Moniker's performance and documentation

I feel like it still would be handy to have a nice way of doing binding, but perhaps this would require more experimentation using other techniques first. I'm not sure though.

brendanzab commented 5 years ago

Updated the description with a link to this example: https://github.com/jozefg/nbe-for-mltt/

glaebhoerl commented 5 years ago

I haven't tried to understand them very much, but some recent work in Haskell on nominal stuff, in case it might come in helpful:

https://github.com/ekmett/name

https://hackage.haskell.org/package/nominal

brendanzab commented 5 years ago

Thanks for linking those @glaebhoerl, much appreciated! 👍

brendanzab commented 5 years ago

I have a Rust port of Nbe using semantic type checking up at brendanzab/rust-nbe-for-mltt. I think this is the approach I am going to go with, but it will be a bit tricky to make!

brendanzab commented 5 years ago

@AndrasKovacs has put together a nice approach in smalltt. I don't really understand it yet though (the implementation is a little tricky to understand), so I might persist with the nbe-for-mltt approach for now, which seems more approachable. I think we'll be able to adapt smalltt easier by that stage though, if we wanted to.

ratmice commented 4 years ago

While i haven't tried it yet, another thing worth looking into

Similar to the graph approach is using hypergraphs, Ueda has a number of papers on name binding and lambda terms as hypergraphs Name Binding is Easy with Hypergraphs

Not aware of any rust libraries for dealing with hypergraphs being available yet though.

brendanzab commented 4 years ago

Oooh thanks! I've heard of hypergraphs before! Both from @sashaboyd and @robrix https://twitter.com/rob_rix/status/1224461727678509057. One of the issues I'm running into with the way I'm doing NbE in #197 is that it's hard to check the equivalence of dependent record types using this representation:

pub enum Value {
    ⋮
    /// Record type extension.
    RecordTypeExtend(String, Arc<Value>, Closure),
    /// Empty record types.
    RecordTypeEmpty,
    ⋮

pikelet-lang / pikelet