metamath / metamath-knife

Metamath-knife can rapidly verify Metamath proofs, providing strong confidence that the proofs are correct.
Apache License 2.0
25 stars 9 forks source link

[DRAFT] Begin adding the ability to verify definitions #103

Closed david-a-wheeler closed 1 year ago

david-a-wheeler commented 1 year ago

This adds a --verify-definitions 'globlist' option to the set of options and starting code to implement it.

This version doesn't actually perform the check, it instead simply prints as the names of the statements to check. More importantly, it begins setting up the code infrastructure so we can implement the definitions check "for real".

david-a-wheeler commented 1 year ago

@digama0 @tirix

This is just an early beginning, but I really want metamath-knife to be able to verify definitions. This sets up the infrastructure so it can take the list of globbing patterns and correctly report which statements to check.

Comments/suggestions welcome!

david-a-wheeler commented 1 year ago

I'll need to fix the clippy complaints.

digama0 commented 1 year ago

I think we should move away from the globbing here. The exceptions to the definition check should be specified in the mm file itself using $j comments, not on the command line. This information is relevant for tasks other than just definition checking, for example the MM -> MM0 translator wants to know where the definitions are so it can translate them properly. Most of the infrastructure for this already exists, it was just not being used in metamath-exe / mmj2 because they don't parse $j comments.

david-a-wheeler commented 1 year ago

I suppose you will keep this PR as a draft and only merge once you have a working solution?

I intend to propose merging it once it meets two conditions:

  1. It's cleaned up a little. I thought it'd be better to start down the path & get early feedback.
  2. It has a useful function. That is, it has at least one useful check on the definition instead of just a debugging print statement. Once we have one part of a definition check in place it'll be easier to add more.
tirix commented 1 year ago

The exceptions to the definition check should be specified in the mm file itself using $j comments

I think these commands don't exist yet? What do you suggest to use? What about something like below?

$j axiom_prefix 'ax-'; 
   definition_prefix 'df-'; 
   definition_exception 'df-bi';
   definition_exception 'df-clab;
   definition_exception 'df-cleq';
   definition_exception 'df-clel';

Also set.mm already has commands like

david-a-wheeler commented 1 year ago

I agree with @digama0 that in the long term this should be in the database itself. Let's get those $j values agreed on! But we can do this in parallel; let's get it working with globs while we work on adding that information to the databases, then start using those databases.

digama0 commented 1 year ago

The exceptions to the definition check should be specified in the mm file itself using $j comments

I think these commands don't exist yet? What do you suggest to use?

I think the information is there already, although it might not be obvious. A while back we reorganized things such that a definition axiom comes immediately after the syntax axiom, and moreover there is a primitive $j command which marks syntax axioms for which a definition is not expected. So we have the following decision process:

So while we could have a definition_exception command, it would actually be about suppressing the lint for "axiom not starting with ax-", not changing the set of checked definitions.

Also set.mm already has commands like

* `$j equality 'wceq' from 'eqid' 'eqcomi' 'eqtri';`

* `$j equality 'wb' from 'biid' 'bicomi' 'bitri';`
  These could be used in the definition check (start with either one of those).

Yes, that's why they were added. Another relevant $j command is justification:

because this is how we check df-bi (it doesn't need to be an exception).

@david-a-wheeler:

I agree with @digama0 that in the long term this should be in the database itself. Let's get those $j values agreed on! But we can do this in parallel; let's get it working with globs while we work on adding that information to the databases, then start using those databases.

The reason I mention it now, in this PR, is that I'm not sure we want to add the globbing support at all. At least, there is no need for that to be in the library as it can just take a Statement -> bool function, and I want us to set things up so that the appropriate glob is * in the first place so we can probably delay handling globs in the CLI as well. It should be a niche use to only check some of the definitions: it's not like this is a computationally expensive task.

david-a-wheeler commented 1 year ago

It should be a niche use to only check some of the definitions: it's not like this is a computationally expensive task.

I don't think it'll ever be needed. The check should be so fast that it's not worth worrying about.

The exceptions to the definition check should be specified in the mm file itself using $j comments

Sure, that seems best long-term.

@digama0 :

I think the information is there already, although it might not be obvious.

Yes, it's not obvious :-). I started by using exactly the same approach mmj2 uses, since that's the one implementation of definition checking. But hey, that's why it's best to write a little bit & get feedback, instead of trying to do too much in isolation.

I want to try to implement this for speed & as a single pass. I don't think the current set.mm makes this possible.

Foe example, in develop (commit 37c880e1a9729f788e85ff91321a8d4e2dbbe6eb), on line 12672 we have:

cv $a class x $.

We have to wait for line 24283 for the declaration that 'cv' is primitive:

$( $j primitive 'cv' 'wceq' 'wcel' 'cab'; $)

So I believe that if we want it all in the database, we need to move various $j statements earlier so the verifier can see (and act on) that information when processing beginning-to-end. Otherwise we have to queue up checks, for no good reason.

... Otherwise, it should be followed by a definition

We aren't that strict on "followed by a definition", that it, we certainly don't apply "immediately followed by a definition" and I don't think we should be that strict. E.g., we have:

  $( Extend the definition of a class to include the Cartesian product. $)
  cxp $a class ( A X. B ) $.

  $( Extend the definition of a class to include the converse of a class. $)
  ccnv $a class `' A $.

... followed by many other syntax declarations, then eventually followed by a list of definitions.

Also, we often want justification proofs (e.g., cio, iotajust, df-iota), which are often nestled between the syntax assertion and the definition.

So: What should the rule really be?

tirix commented 1 year ago

So: What should the rule really be?

I think the syntax declaration axiom does not need to be followed immediately by the df- definition axiom, but the first use of the symbol after its declaration should be in the definition axiom.

In your single pass, you could have a list of constant symbols which have been declared but not yet defined, remove them once you see their definition, but emit a diagnostics if you see them used.

digama0 commented 1 year ago

I want to try to implement this for speed & as a single pass. I don't think the current set.mm makes this possible.

Foe example, in develop (commit 37c880e1a9729f788e85ff91321a8d4e2dbbe6eb), on line 12672 we have:

cv $a class x $.

We have to wait for line 24283 for the declaration that 'cv' is primitive:

()

So I believe that if we want it all in the database, we need to move various $j statements earlier so the verifier can see (and act on) that information when processing beginning-to-end. Otherwise we have to queue up checks, for no good reason.

Yes, it is not a single pass as you have noted. $j comments don't lend themselves well to that, you would need some kind of annotation attached directly to the statement for that. ($j comments generally come after the statements they annotate, so that we can say that the referred statements are already in the environment.)

My recommendation is to do what @tirix said and move identification of primitive and some other grammar things into the name pass. Then the definition check is a separate pass that depends on it.

... Otherwise, it should be followed by a definition

We aren't that strict on "followed by a definition", that it, we certainly don't apply "immediately followed by a definition" and I don't think we should be that strict. E.g., we have:

  $( Extend the definition of a class to include the Cartesian product. $)
  cxp $a class ( A X. B ) $.

  $( Extend the definition of a class to include the converse of a class. $)
  ccnv $a class `' A $.

... followed by many other syntax declarations, then eventually followed by a list of definitions.

We can weaken it to a sequence of syntax axioms followed by a sequence of definitions, but it might also be simpler to just enforce a stricter ordering. I think given the choice between these two it is preferable to increase the strictness of checks on set.mm because all verifiers benefit from that.

Also, we often want justification proofs (e.g., cio, iotajust, df-iota), which are often nestled between the syntax assertion and the definition.

It should never be necessary to do this, since a justification theorem should not make use of the newly defined constant, it is purely about the independence of the definiens on any bound variables in the un-extended language.

tirix commented 1 year ago

Maybe a few pointers:

david-a-wheeler commented 1 year ago

@david-a-wheeler :

Also, we often want justification proofs (e.g., cio, iotajust, df-iota), which are often nestled between the syntax assertion and the definition.

@digama0 :

It should never be necessary to do this, since a justification theorem should not make use of the newly defined constant, it is purely about the independence of the definiens on any bound variables in the un-extended language.

Necessary, no, but in the past we've allowed this. I think we cannot allow a justification theorem to occur after the definition. The justification theorem must occur before the definition it justifies, else it interferes with doing things in one pass, where I want to trigger the check of the definition df-FOO given only the statements up to this point, so we can read the justification theorem & check against it. Also, if we allow the justification to appear afterwards, it might be possible for the definition (or something else after it) to insert a problem, such as a circularity. So this will imply a stricter ordering than we've done so far:

  1. Justifications ($p), some time before (but likely close) to...
  2. Syntax introduction ($a), possibly a set of them. These will be remembered as a "new syntax collection".
  3. Actual definition ($a), possibly a set of them, each of them using only what's in the new syntax collection. At the next non-definition assertion or end-of-file, all new syntax introductions must be used & the "new syntax collection" is then empties. I think that'll allow 2-way and 3-way and, for example.
david-a-wheeler commented 1 year ago

Yes, it is not a single pass as you have noted. $j comments don't lend themselves well to that, you would need some kind of annotation attached directly to the statement for that. ($j comments generally come after the statements they annotate, so that we can say that the referred statements are already in the environment.) My recommendation is to do what @tirix said and move identification of primitive and some other grammar things into the name pass. Then the definition check is a separate pass that depends on it.

Ugh. I hate that. I was really hoping to make this a single pass. Well, i guess that can work. So as a result, we have to do the name pass, then come around and do a definitions check pass. The definitions check will itself be one pass (so it's still speedy), but it'd depend on a separate naming pass.

digama0 commented 1 year ago

@david-a-wheeler :

Also, we often want justification proofs (e.g., cio, iotajust, df-iota), which are often nestled between the syntax assertion and the definition.

@digama0 :

It should never be necessary to do this, since a justification theorem should not make use of the newly defined constant, it is purely about the independence of the definiens on any bound variables in the un-extended language.

Necessary, no, but in the past we've allowed this.

Ideally a definition would be one statement. The fact that it is actually three different metamath statements is unfortunate and complicates many things. I don't see extra flexibility here in separating the parts as a good thing.

I think we cannot allow a justification theorem to occur after the definition. The justification theorem must occur before the definition it justifies, else it interferes with doing things in one pass,

I was suggesting to put the justification theorem before the definition. But in any case I don't think it is true that we have to put it after the definition, as long as the justification theorem does not depend on the definition itself. We can afford to be strict here, and that would mean putting things in an order like:

  1. justification theorem (if any)
  2. syntax axiom
  3. definition
  4. $j comment
  1. Justifications ($p), some time before (but likely close) to...

  2. Syntax introduction ($a), possibly a set of them. These will be remembered as a "new syntax collection".

  3. Actual definition ($a), possibly a set of them, each of them using only what's in the new syntax collection. At the next non-definition assertion or end-of-file, all new syntax introductions must be used & the "new syntax collection" is then empties. I think that'll allow 2-way and 3-way and, for example.

I don't see a point for "syntax collections" other than that's what is needed to check set.mm without modification, which IMO should not be a goal. In particular that allows for the possibility of mutually defined syntax, which we don't want in the first place.

digama0 commented 1 year ago

Yes, it is not a single pass as you have noted. $j comments don't lend themselves well to that, you would need some kind of annotation attached directly to the statement for that. ($j comments generally come after the statements they annotate, so that we can say that the referred statements are already in the environment.) My recommendation is to do what @tirix said and move identification of primitive and some other grammar things into the name pass. Then the definition check is a separate pass that depends on it.

Ugh. I hate that. I was really hoping to make this a single pass. Well, i guess that can work. So as a result, we have to do the name pass, then come around and do a definitions check pass. The definitions check will itself be one pass (so it's still speedy), but it'd depend on a separate naming pass.

That was already going to be the case. Almost everything depends on the name pass. To do definition checking we will need at least the name pass and the scope pass, also probably the grammar pass.

david-a-wheeler commented 1 year ago

I seem to keep "running out of time" to work on this.

@tirix has picked up this goal in PR #116 and made more progress, so I'm closing this PR with the plan that #116 will complete this work instead.

tirix commented 1 year ago

Thank for this initiative @david-a-wheeler !