metamath / metamath-knife

Metamath-knife can rapidly verify Metamath proofs, providing strong confidence that the proofs are correct.
Apache License 2.0
25 stars 9 forks source link

add verify markup pass #70

Closed digama0 closed 2 years ago

digama0 commented 2 years ago

This replicates the functionality of metamath.exe's VERIFY MARKUP command. New checks:

  1. Labels should not contain _
  2. Labels should not start with mm
  3. Labels should not match reserved windows filenames CON, PRN, AUX etc
  4. If there is an axiom ax-foo with a corresponding axfoo axiom/theorem then they should match
  5. Axioms that prove |- ... should have a name ax-* or df-*
  6. Tokens should not use the @ and ? characters
  7. Every token should have a htmldef command
  8. Use of the author ?who? is forbidden
  9. Dates should parse
  10. Parentheticals should come in chronological order
  11. Axioms and theorems (except for syntax axioms) should have a Contributed by comment
  12. There should not be two Contributed by comments
  13. The Contributed by comment should be the first one in the list
  14. Proof modification is discouraged` is not allowed on axioms
  15. *OLD and *ALT theorems should be marked as (Proof modification is discouraged.) and (New usage is discouraged.)
  16. Lines should have at most 79 characters (TODO: make an exception for lines with URLs)
  17. There should be no trailing whitespace
  18. Tabs should not appear anywhere
  19. There should not be two htmldef (and althtmldef, latexdef) commands for the same token
  20. Tokens appearing in htmldef should be declared
  21. The exthtmllabel command should reference an existing label
  22. Header comments (defined as starting with $( <whitespace> #*#*...) should parse as header comments (decorator, newline, one line of text, newline, decorator, markup text)
  23. The ~ and ` characters, when used as delimiters in markup, should have whitespace around them
  24. Characters [, ~ and ` must be doubled if they could not be interpreted as special in markup
  25. <HTML> and </HTML> must appear in all caps, not nested, and in balanced pairs
  26. Tokens inside ` math strings in markup must exist in a $c/$v statement
  27. Math strings must be closed before the end of the comment
  28. <HTML> must be closed before the end of the comment
  29. Label references via ~ must not be empty (because the ~ was at the end of the comment or was aborted by a higher priority markup)
  30. Bibliography tags must not contain ~ or `
  31. Bibliography tags must reference an <a name="..."> tag in the bibliography file, if provided
  32. Mathbox headers (section headers after the mathbox label) must have the form Mathbox for X
  33. Mathboxes should not refer to theorems from other mathboxes

Interestingly, (10) is not implemented in exactly the same way in metamath.exe, and running this on set.mm has unearthed two issues:

david-a-wheeler commented 2 years ago

Spectacular! Quick question: This is something you can enable or not per database, instead of ALWAYS requiring it, yes?

Eventually I'd like to also allow ISO dates in YYYY-MM-DD form, but there's no need to do that now.

digama0 commented 2 years ago

I think the dates are even specified as being YYYY-MMM-DD in the metamath book, but metamath.exe and set.mm obviously don't match that.

The verify markup pass is completely separate from other activities of metamath-knife, like metamath.exe's VERIFY MARKUP. You have to explicitly enable it independently of proof verification, which you can do by setting the -m flag in the command line. You can specify -v -m or -v or -m depending on what you want, and call it on the databases you like.

digama0 commented 2 years ago

I added comment markup checking, and rules 22-28 listed in the PR comment. This has found quite a few more issues, 535 reported issues in set.mm, to the extent that we may want to relax some of these checks. The majority of reported errors are violations of rules 23 (mandatory whitespace around `) and 24 ([ should be doubled in regular text). Besides those, three more errors have been found:

Here are some examples of rule 23, 24 violations:

warning: Markup character requires surrounding whitespace
     --> ../mm/set.mm:24152:65
      |
24152 |   correspond respectively to the logical exclusive disjunction (~ df-xor ) and
      |                                                                 - Put spaces around this character
      |
warning: Markup character requires surrounding whitespace
     --> ../mm/set.mm:30270:63
      |
30270 |        Lammen, 30-Dec-2017.)  Reduce symbol count in ~ nfex , ~hbex .  (Revised
      |                                                               - Put spaces around this character
      |
warning: Invalid escape character
     --> ../mm/set.mm:81549:33
      |
81549 |        more general sets than R [the real numbers] and to other objects.".  The
      |                                 - This escape character should be doubled
      |
      = note: This character has special meaning in this position, but it was not interpretable here.
      = note: Use ~~ or [[ or `` if you mean to include the character literally
warning: Markup character requires surrounding whitespace
      --> ../mm/set.mm:182964:29
       |
182964 | ` -. 2 || N  ` to say that "` N ` is _odd_" (under the assumption that
       |                             - Put spaces around this character
       |
warning: Markup character requires surrounding whitespace
      --> ../mm/set.mm:182965:11
       |
182965 | ` N e. ZZ `).  The previously proven theorems about even and odd numbers, like
       |           - Put spaces around this character
       |
warning: Invalid escape character
      --> ../mm/set.mm:279456:55
       |
279456 | "[Cramer's rule] ... expresses the [[unique] solution [of a system of linear
       |                                                       - This escape character should be doubled
       |
       = note: This character has special meaning in this position, but it was not interpretable here.
       = note: Use ~~ or [[ or `` if you mean to include the character literally
warning: Markup character requires surrounding whitespace
      --> ../mm/set.mm:279463:2
       |
279463 | (~ https://en.wikipedia.org/wiki/Cramer's_rule#A_short_proof), is as follows:
       |  - Put spaces around this character
       |

Looking at these examples, it seems like spaces are usually used on the inside of a math string, rather than on both sides, and similarly there are many occurrences of (~ label) with space after but not before the ~. For the doubled character error, [ is the usual culprit, and examples like L.279456 above show that users are depending on the behavior that [foo bar] is not treated as a bibliographic reference and puts a literal [ character, but [foobar] is a bibref (which is why the [[ in [[unique] is doubled).

What do people think? Should we update set.mm with these more stringent requirements or drop / weaken the requirement to decrease the number of false positives?

david-a-wheeler commented 2 years ago

The verify markup pass is completely separate from other activities of metamath-knife, like metamath.exe's VERIFY MARKUP. You have to explicitly enable it independently of proof verification, which you can do by setting the -m flag in the command line. You can specify -v -m or -v or -m depending on what you want, and call it on the databases you like.

Perfect!

I added comment markup checking, and rules 22-28 listed in the PR comment. This has found quite a few more issues, 535 reported issues in set.mm, to the extent that we may want to relax some of these checks. The majority of reported errors are violations of rules 23 (mandatory whitespace around `) and 24 ([ should be doubled in regular text). Besides those, three more errors have been found:

Wow. Well, you've proven once again that automated checking can detect problems :-).

I think this should be raised to the mailing list for discussion. We should try to steer all tools to eventually agree on what the checks are :-). I'm torn on whether or not we should actually make them stringent. I'm okay with eventually making the rules more stringent, especially if they'll eliminate potential ambiguities, but there should be some reason to enforce a rule. If we enforce rules not previously enforced, we ought to do it in stages - update the .mm files, give people time to adjust their copies, and only later start enforcing them.

digama0 commented 2 years ago

Okay, this PR is now feature complete, with some omissions:

Additionally, bibliography checking is allowed by the API but there is no way to trigger it from the command line currently. The issue is what to do about locating files (specifically: the bibliography files in the htmlbibliography command and the GIF files in the checks above), as I would rather not just naively assume everything is in the current directory like metamath.exe. Suggestions on a good command line input here are welcome.

The next step is to get set.mm passing the checks. Some of the checks have been weakened per the metamath group discussion, so we are down to 112 issues, most of which are easy to fix.

digama0 commented 2 years ago

It is now possible to give bibliography files on the command line, via --biblio FILE. This command can be used twice; the second will supply the exthtmlbibliography for set.mm.

Markup is also fixed in metamath/set.mm#2472.

tirix commented 2 years ago

Thanks for all the answers and explanations, LGTM!