Elaboration - Githubissues

jonsterling commented 6 years ago

We'll need an elaborator, and I want it to have support from early on for both holes and some kind of implicit arguments to functions. The reason for this is, this is the sort of thing that distinguishes and toy language from a proof assistant, and I think if we don't carry it out, we never will.

This opens up a whole can of worms, and I'm not sure what the least gnarly way to implement it is.

One decision choice I would kind of like to make, if possible, is to not expose any of this in the core language; that is, I would like to avoid core terms for Pi and App keeping track of whether something was implicit or not, etc. It seems like we should be able to isolate this in the source language and the elaboration process. But I'm not certain.

It may even be reasonable to restrict implicit arguments to top-level declarations, and distinguish between "type schemes" and actual types. I think this is how it was done in Epigram. Not sure what's best.

ghost commented 6 years ago

This opens up a whole can of worms, and I'm not sure what the least gnarly way to implement it is.

Atkey's work on algebraic typechecking and elaboration is probably the most elegant approach that I've seen in general. I'm not sure how easy it would be to adapt to this context but having an algebraic solution with an equational calculus for reasoning about holes and implicits would be very nice.

jonsterling commented 6 years ago

I remember Atkey's work being very elegant... I'm not yet certain of the details though.

One option might be to start with an Epigram/Idris style proof state monad and elaborator, and see to what degree such a thing could then be packaged up in the style of Atkey at the interface.

I was re-reading the Idris paper this morning and the nice thing about it is that Edwin explains how all this stuff works in a very down-to-earth way, which is a bit easier for me to figure out than what appeared in the Oleg and Epigram work.

jonsterling commented 6 years ago

@freebroccolo As an update, I've implemented the beginning of an OLEG style elaborator, together with proof script that constructs the identity function. It's kind of cool! I kind of doubt that my simple-minded style of proof state will actually work long-term, fwiw.

jonsterling commented 6 years ago

As an update, I'm fairly certain that the simple-minded style mentioned above probably won't work generally. Let me summarize things a little bit:

In Idris, the proof state is (roughly) a dev calculus term together with a queue of hole names. These hole binders ?x : T. e actually have global names which are indexed in this queue. The way things work is that whatever hole name is at the front of the queue, this is the hole that you are focused on.

Now, as an example, to elaborate a placeholder expression _ in the source language, you just move the current hole to the end of the queue and keep going.

In what I've implemented, I tried to do things without a specific hole queue, and instead just treat terms themselves as the proof state. To be more precise, my proof state is like a stack of partial terms (which could be collapsed into a single term, in an operation that I call unload). And rather than looking at the hole term together with an index deep into it, I just have the "current slice" of the term in focus, together with a continuation that lets me plug in a modified version of that slice into the entire proof state.

And as far as I can tell, there doesn't really seem to be a clear way for me to implement the elaboration of _ under this design. It's not definitive, but it's a sign that this isn't the right representation.

What I don't like about Idris's representation is that you have the whole term in view at any point, but you may be "focused" on a hole that is deep inside it; the result is that to execute a tactic on the current hole, you have to repeatedly traverse deep into the global term to get to it. Somehow it would be nice if this could be local, but I'm not sure it's possible (for instance, in the presence of unification?).

Anyway, this note is probably fairly confusing, and it just reflects my own level of confusion. Hopefully things will become a bit clearer.

ghost commented 6 years ago

And as far as I can tell, there doesn't really seem to be a clear way for me to implement the elaboration of _ under this design. It's not definitive, but it's a sign that this isn't the right representation.

Up to this point, what you've described sounds reasonable to me. Can you say more about what goes wrong when you try to implement elaboration with this representation?

Shouldn't the stack representation and the flattened representation be equivalent (i.e., unload being one side of an isomorphism or something)?

What I don't like about Idris's representation is that you have the whole term in view at any point, but you may be "focused" on a hole that is deep inside it; the result is that to execute a tactic on the current hole, you have to repeatedly traverse deep into the global term to get to it. Somehow it would be nice if this could be local, but I'm not sure it's possible (for instance, in the presence of unification?).

I haven't read the material on Idris but I don't like the sound of this either. The first thing that comes to mind is that a zipper-like representation might be a good alternative but if I'm understanding correctly, you are suggesting this doesn't seem to work out?

jonsterling commented 6 years ago

@freebroccolo Let me try to be a bit more lucid today than I was yesterday 😄

The problem is that in the current naive representation, we don't really keep track of what order we want to attack the holes in. So, I think it's not clear how to implement a lot of things that are essentially based on deferring one hole until another is solved, etc.
Yes, what presents itself as a possible solution is some kind of zipper type representation, but even then you need to keep track of a queue of positions in the structure. This could be feasible, but a significant complication is that when you solve one hole, i think that it can invalidate the zipper-paths that you have stored, so you would need to somehow recalculate them as far as I can tell. This might end up being worse than the naive approach that is used in Idris.

So, I'm really not sure how to do this properly... But I may be missing something.

jonsterling commented 6 years ago

I think I might have another simple idea for how to implement this efficiently... I'll try it when I get a chance.

jonsterling commented 6 years ago

@freebroccolo To give a bit more detail, here's what I have in mind. I'm not sure the best way to code it, but at least in an abstract way it might make some sense.

Write Σ for the signature of the term language.
Now, write Σ+ for the extension of the above with a constructor Ref of hole_name
Write F[Σ] for the signature endofunctor of a signature Σ.
Define a type hole_env := Map hole_name F[Σ](μ(F[Σ+])); in the other words a "hole environment" is a finite map from hole names to "terms" which may have references to holes on there interior, but not at their head.
Then a proof state will be hole_env * hole_name queue.
Finally, we can "distill" or "materialize" a goal into a proof term with some (partial) function hole_env * hole_name -> μ(F[Σ]).

Note that Σ would still include Conor's development calculus extensions including ?x : T. e and ?x ~ e1. e2; the point of this extension of the signature is to allow us to work on many parts of the term at once without needing to remember where they are located. We could also regard this an extension to the real signature of the language, but I think that's a separate question.

The materialization function is partial for two reasons: it may encounter a reference that is not in the environment, and it may also encounter a reference cycle. A correctness criterion for elaboration rules would be that they do not induce cycles. The materialization function would only be called at the end of the elaboration process with the completed result.

jonsterling commented 6 years ago

Most likely for this to work we would need to have hole_env keep track of the actual goal associated to each hole (a sequent), of course.

jonsterling commented 6 years ago

I was thinking a bit about implementation and it seems like it will be important to use a structure that has efficient lookup. This is because the term being developed will generally be sprawled across the environment, with each item in the environment maybe adding just one or two constructors before bottoming out at a reference to something else in the environment.

So, routine operations (such as evaluating or typechecking a term) would involve a ton of lookups. I thought about an imperative implementation with hash tables since they usually have constant time lookup, but my concern there is that to implement certain things we would need to snapshot the state sometimes, and this is quite expensive. I wonder if it would be absolutely brutal performance-wise to just use Map.

ghost commented 6 years ago

Have you thought about doing the hash-consing thing for the environment? It sounds to me like this is almost what you are suggesting. In other words, you do the hashing and maximize sharing each time something is added into the environment. With memoization, this would make lookups and other operations very efficient.

The only catch is that it might be difficult to do hash-consing only for the elaboration machinery versus doing it for the term representation as well.

As far as cloning the structure is concerned, I'm not sure what the options for that are off the top of my head but I'm fairly confident we can find a way to make it efficient enough without too much pain.

I haven't had much time to do anything interesting with this in the last couple weeks but hopefully that will change soon. Maybe it would be a good idea to just try and make little demo that uses hash consing for the term representation and see if we can scale it up.

jonsterling commented 6 years ago

This all sounds like a very good idea... My suggestion is that we try to implement it, but don't worry about using any of the existing code. To validate the idea it would suffice to give a very simple language with only unit, pi and a universe (type-in-type for now).

Let's try going all-in on hash-consing...

jonsterling commented 6 years ago

@freebroccolo By the way, I want to clarify that if we are going to use this proof state idea, we would still need some kind of explicit substitution.

That is, suppose a proof state contains X :: (G, x : A !- M : B); when the meta X is referred to elsewhere in another part of the proof state, we generally we need to be able to apply some substitution, such as X[N/x]. This is kind of like in contextual modal type theory.

(Note that in the OLEG setup, you don't need these explicit substitutions because everything is staged so that you never need such a thing until X has materialized into some kind of actual term. But as soon as you try to find a reasonable way to implement this (such as a proof state of the kind I proposed), the OLEG style loses its appeal.)

jonsterling commented 6 years ago

With that said, it should be possible to limit the appearance of these substitutions just to the reference nodes, which point into the environment.

Anyway, I don't yet really understand how all this works together with the hash-consing stuff. It may be that if we consider hash-consing and memoization, maybe there is a better design for the proof state than what I have proposed. I'm not sure!

ghost commented 6 years ago

Makes sense. I don't think it's necessarily bad if we need explicit substitutions in order to make the elaboration algorithm work. The one thing we wouldn't really benefit from is the fact that explicit substitutions can improve efficiency through the delaying and accumulation of certain kinds of traversal operations.

jonsterling commented 6 years ago

@freebroccolo Let me give a little bit more explanation about some questions I'm having...

Suppose we start with a proof state like X : (G !- ? <= A->A); Psi; this should be read as letting the meta X stand for an unknown element of type A->A in context G, where the remainder of the proof state is Psi. Things in Psi can mention X.

Now, we can refine the goal X using the function introduction rule. Then, we would have something like the following: Y : (G.A !- ? <= A); X : (G !- lam(@Y) <= A -> A); Psi. Now, at this point what our elaboration algorithm does is kind of up for grabs. There's a couple options:

One option would be to immediately get rid of X and just substitute lam(@Y) for X in the rest of the proof state. This is the approach taken in RedPRL/Dependent LCF; in fact the representation of proof states there forces the substitution to take place right away. At least in RedPRL this is extremely expensive, because if you refine a goal early in a very wide proof state, this can result in dozens of substitutions needing to occur at every elaboration step. This basically ruins performance in RedPRL, but I want to note that we don't have any memoization in RedPRL so it's hard for me to compare.
Another option would be to leave X as it is! When we need to learn something about X (for instance, if we try to evaluate the term app(@X; N), we just look it up in the environment.
Maybe something in between, or completely different?

Anyway, this is one of the main design questions that's been bugging me. Do you have any insight into what you think would be the right way to think about this stuff?

ghost commented 6 years ago

One option would be to immediately get rid of X and just substitute lam(@Y) for X in the rest of the proof state. This is the approach taken in RedPRL/Dependent LCF; in fact the representation of proof states there forces the substitution to take place right away.

Is there something about how RedPRL works that requires the proof states to be represented in such a way that substitutions must be performed immediately like this? Does it make the algorithms simpler or is it just an artifact of early design where performance wasn't a big concern?

Another option would be to leave X as it is! When we need to learn something about X (for instance, if we try to evaluate the term app(@X; N), we just look it up in the environment.

It seems to me that this ought to be the better approach. Doing it this way essentially retains sharing in the proof state which gets lost if you do the substitutions immediately. The terms won't get expanded as much (which is probably better for the user) and the overall proof state will stay more compact.

I suppose the downside to this approach is that it might mean you'd need to repeatedly normalize certain things that otherwise would have already been normalized and substituted throughout the proof state, but I think that would be less expensive overall and it'd also be helped by memoization.

One thing to keep in mind too is that if we use hash-consing throughout the implementation, then when you immediately get rid of X and just substitute lam(@Y) for X, you're essentially trading a kind of formal sharing in the proof state for an implicit sharing that will result from the hash-consing doing its graph compression thing. From that point of view, it seems like retaining X is superior because it's strictly more informative.

Another interesting thought about this is that it might be possible to convert back and forth to some extent, i.e., to reverse the substitution (as long as it has not fully disappeared through reduction), due to the fact that we'd have this sharing encoded in the graph and a cheap way to check equality of sub-terms (pointer equality). Figuring out how to do that would probably be kind of gnarly though.

jonsterling commented 6 years ago

Is there something about how RedPRL works that requires the proof states to be represented in such a way that substitutions must be performed immediately like this? Does it make the algorithms simpler or is it just an artifact of early design where performance wasn't a big concern?

Basically it was done this way because this was what I came up with first and I wasn't really thinking about performance... I think i was a design mistake, but it was also the simplest way to do things at the time.

It seems to me that this ought to be the better approach. Doing it this way essentially retains sharing in the proof state which gets lost if you do the substitutions immediately. The terms won't get expanded as much (which is probably better for the user) and the overall proof state will stay more compact.

I think this makes sense! (And the rest of what you say too) Cool.

jonsterling commented 6 years ago

@freebroccolo Ah, one question. Can you explain why you feel this would make the proof state more compact? By not substituting and instead just following pointers through the proof state, we would end up with proof states that are very large (maybe with hundreds or thousands of cells)... But each cell would be quite small.

Ultimately I think that what you say makes sense though, because in some sense it is just a formal kind of sharing.

ghost commented 6 years ago

You're right; I should be more careful when saying the proof states would be more compact because it's not necessarily the case.

What I was thinking is that if you substitute immediately it seems you will often step to an enlarged proof state because normalization will be blocked on some inner metavariable (Y in this case). So although you might have a smaller proof state toward the end if you do these substitutions, you still might have worse complexity behavior (time/space) throughout most of the elaboration.

Another thing to consider as to why it would probably be better to avoid substituting early is because it gives more flexibility, particularly where interaction is concerned. It would be nice to allow the same kind of behavior you have in Coq, where normalization can controlled explicitly through tactics, vs. the Agda approach where it's mostly all or nothing.

Does that make sense?

jonsterling commented 6 years ago

Yes, I think I agree with your analysis at an intuitive level... And it would be quite easy for us to do things like lazily memoize (e.g.) the normal form of each node in the proof state if that were desirable, or even store information about what other nodes are blocking the solution to a particular node, etc. This should make it possible to drive elaboration in a very fine-grained way, which is exciting.

My expertise is not (yet) in the area of fine-tuning algorithms for performance, so the only thing I wondered about was whether we had anything to worry about from the potentially very-wide proof states that will result from not substituting eagerly. But if it seems like if there is no immediate concern with that, then having the term very finely dissected across the proof state does give us a lot of flexibility, and it would be a shame to throw that away prematurely!

Already I have some experience with how bleak it is when you substitute prematurely :wink:

jonsterling / tt

Elaboration #9