Work Variables - Githubissues

tirix commented 2 years ago

In MMJ2, whenever applying a theorem leaves unresolved substitutions, like when applying ~addcan2ad in the example below, work variables such as &C1 are created.

$( <MM> <PROOF_ASST> THEOREM=meta_var  LOC_AFTER=?

!d1::              |- ( ph -> A e. CC )
!d2::              |- ( ph -> B e. CC )
!d3::              |- ( ph -> &C1 e. CC )
!d4::              |- ( ph -> ( A + &C1 ) = ( B + &C1 ) )
qed:d1,d2,d3,d4:addcan2ad |- ( ph -> A = B )

$)

Coq calls them existential variables, and it looks like Lean calls them metavariables.

It would be nice if metamath-knife could support this functionality.

tirix commented 2 years ago

The proof checker itself does not need those, and it shall not be impacted. I've built the Formula around Atoms, which are basically just repurposed u32, so my need would be around assigning new label Atoms which could be used in formulas, and possibly printed.

I see two possible approaches:

one could be to allow the library user to add and name new variables. This would possibly mean a new API in Nameset to do so, which is responsible for allocating Atoms. The user would also provide a Token (Box<[u8]>) for the name, and would have to keep track of the work variables it has created, since there is no obvious way of distinguishing them from regular variables.
another way could be to reserve a range of u32 for work variables, or use an enum. The advantage would be that it is then immediate to check if a variable is a work variable. The drawback is that this would add a performance cost to all methods involving Atoms. A simple lookup would turn into a test + lookup.

Note that for each work variable, we would possibly need two new Atoms:

one for the new symbol itself (a Symbol Atom) for resolving a textual representation
one for the corresponding virtual "float" label declaration (a Label Atom) for use in formulas

In both cases, it would not be possible to associate a StatementAddress or we would need to associate the default StatementAddress in order to be able to use lookup_float, lookup_symbol or lookup_label.

digama0 commented 2 years ago

I think we should not use Formula for formulas with work variables or other "proof assistant" features at all. This is overhead for everything that handles the completed formulas, and it would definitely show in the verification time of set.mm. Instead, use a special formula type for expressions with metavariables, and loading a theorem into the proof assistant entails converting from the Formula representation to this expanded type. I'm not even sure whether it needs to be part of mm-knife; at least, it can be a separate module that doesn't interact with other things, but it could also be a separate dependent crate.

tirix commented 2 years ago

Thanks for your quick feedback! Actually, Formula is not used at all in the proof verification, metamath-knife's proof verifier works directly on textual tokens. As a matter of fact I've only introduced grammar parsing and formulas for the sake of unification in a proof assistant.

Though, I believe that once one has a parse tree for all formulas in the database, proof verification might possibly be sped up significantly, since a substitution can be performed in constant time, and best-case comparison of different formula is also constant time if the parse tree root differs.

That was part of my project in the somewhat orphaned #63 : building an abstract proof verification, which could be used both on textual representation of expressions (like currently), or on formulas, or any other support.

I'll give it some thought, as it sounds like the best way to implement this is inheritance and overriding, but rust usually forces to think differently: here, Formula would typically become a trait. What about making the "Label" a generic type in formulas?

digama0 commented 2 years ago

If you do have a type for this, you should not mix constants, variables and metavariables. From a proof assistant's perspective, variables act like constants and metavariables act like variables, because only the latter are subject to substitution. The distinction between constants and variables still matters however, because disjoint variable requirements are expressed in terms of the variables contained in the expression (and you need to have some higher order thing in order to handle disjoint variable requirements for unresolved metavariables properly; mmj2 just defers the check until the metavariable is completely assigned).

Regarding "inheritance", I would just have a From<Formula> for MetaFormula implementation like

match x {
  Formula::Const(x) => MetaFormula::Const(x),
  Formula::Var(x) => MetaFormula::Var(x),
}

where Formula is an enum with two variants and MetaFormula is an enum with three variants. There is also the matter of whether the proof assistant is using trees or string representation for formulas; you can implement metaformulas either way but it affects the way you write things quite a bit.

tirix commented 2 years ago

From a proof assistant's perspective, variables act like constants and metavariables act like variables

Indeed, interesting.

I would just have a From<Formula> for MetaFormula implementation like...

With Formula as a tree, and variables hidden inside, I don't see it work this way. Right now I'm rather thinking about a

struct Metaformula {
   formula: Formula,
   metavariables: ...
}

Where metavariables would allow to identify where the metavariables are (or "work variables" in MMJ2 parlance). But this does not solve the fact that I still need fresh metavariables inside my Formula structure.

There is also the matter of whether the proof assistant is using trees or string representation for formulas; you can implement metaformulas either way but it affects the way you write things quite a bit.

I've been using the metamath-knife Formula type, which is a tree.

digama0 commented 2 years ago

MMj2 represents formulas as trees of statements (the parse tree), and work variables are WorkVarHyp extends VarHyp. So it is as if you had some other kind of my-work-var $e statement you can use to build up your parse tree.

As another point of comparison, in MM0 expressions are represented completely differently in the proof assistant and in the packed version in the environment. The latter is an Expr type which is a tree of term applications to variables, while the former is a LispVal which is just a lisp s-expression, passed around by the user code, where things like (foo a b c) are interpreted as an application of the expression constructor foo. Metavariables are special lisp values #<mvar i> that can be constructed through builtins (but cannot be confused either with atoms like x which represent variables, or lists like (foo) which represent constants).

digama0 commented 2 years ago

Regarding your last edit: metavariables are allocated and their assignments tracked by the proof context (global to the whole proof, not just the one expression). So there is no need for a metavariables field in Metaformula like you wrote. I wouldn't do a detailed tracking of the metavariables in the formula other than perhaps a boolean flag to indicate if any metavariables are in the subtree (this can speed up metavariable instantiation).

tirix commented 2 years ago

Regarding your last edit

Sorry this was misleading, of course metavariables shall be tracked at the level of the proof context. What I meant there was a kind of boolean flag to indicate where in the subtrees there are metavariables, as a kind of local cache/speedup, quite like you explain in your last sentence.

tirix commented 2 years ago

If I remember correctly, MM0 has a very rich Lisp syntax which includes not just expressions and metavariables, but also instructions/commands. I assume that this is what LispVal corresponds to. It looks like it boils down to a kind of enum where one option is to itself be a list, so this is also a kind of tree structure, right?

digama0 commented 2 years ago

Yes, LispVal is the set of lisp values (i.e. anything you can get by evaluating a lisp expression), so it also includes things like closures, strings and other stuff. It's similar to what you would find in the Value type of dynamic languages like Javascript or Python (or Lisp). This is used for all metaprogramming activities in the language. But when you pass one of these objects to a theorem proof as in theorem foo: $ T. $ = <lisp code here>;, the value is interpreted as a MetaFormula which means that most of those other kinds of values would cause a type error.

The grammar of allowed expressions looks something like this, which you should think of as a subset of LispVal (this enum doesn't actually exist, it is for illustration purposes):

enum MetaFormula {
  Atom(AtomId), // variable/hypothesis reference
  List(AtomId, Vec<MetaFormula>), // term/theorem application
  MVar(MVarId), // term metavariable 
  Goal(GoalId), // proof metavariable 
}

So yes, it is a tree structure, with explicit variants for metavariables. (There is also a bit more than this, for (:conv e c p) conversion proofs, but those don't come up in metamath.)

Here is the actual code that does the parsing of LispVal into ProofHash objects, which eventually become ProofNode which is the storage format for completed proofs.

digama0 commented 2 years ago

Note that in this kind of setup, metavariables can end up hanging around in the term even after they have been assigned, because there is no explicit list of all expressions everywhere which we can iterate over to eliminate all occurrences of the metavariable. (And even if there was, this would be quite expensive when you are creating and assigning lots of short-lived metavariables.) Instead, you just remember that the metavariable is assigned, and make sure that anything you do with expressions treats assigned metavariables as equivalent to the expression that they were assigned to.

tirix commented 2 years ago

MMj2 represents formulas as trees of statements (the parse tree), and work variables are WorkVarHyp extends VarHyp. So it is as if you had some other kind of my-work-var $e statement you can use to build up your parse tree.

This is the method I'd like to use. There will still be another layer of meta variables on top of that (like goal, at tactics language level), but metamath-knife will not know about them at all because they will be evaluated before any call to metamath-knife API.

Nevertheless, the work variables / metavariables are not evaluated, but are to be resolved through unification - just the one that metamath-knife shall provide.

It looks to me like the best way to do that would be to use a generic data type L to use with Formula<L>, where L will usually be Label when created from the grammar, but where it might also be a user specific type optionally including meta-variables. I believe I would still keep FormulaRef as using Formula<Label> internally. There shall be no runtime cost. This shall allow to have this work variable / meta variable mechanism without major impact in metamath-knife.

david-a-wheeler commented 2 years ago

This would be great! I find work variables quite helpful.

I'm currently slammed on other things, but I'm hoping to become more present for Metamath stuff in the nearer future. Until then, if I'm a blocker for something let me know, I do not want to inhibit the awesome stuff everyone else is doing.

tirix commented 2 years ago

So PR #89 is a proposal to introduce support for work variables, whereas they are supplied externally as suggested by Mario.

metamath / metamath-knife

Work Variables #87