metamath / metamath-exe

Metamath program - source code for the Metamath executable
GNU General Public License v2.0
77 stars 25 forks source link

Treatment of the inclusion command #85

Open benjub opened 2 years ago

benjub commented 2 years ago

Curently, metamath.c (and mmverify.py) treat inclusion commands ($[ FILENAME $]) as exact copy-pasting, that is, the content of the file FILENAME is pasted in place of $[ FILENAME $] in the input stream before lexing. This permits jokes like the following:

benoit@ordi$ cat joke.mm
$[ joke1.mm $] $.
benoit@ordi$ cat joke1.mm
$c a
benoit@ordi$ ./metamath "read joke.mm" "quit"
Metamath - Version 0.198 7-Aug-2021           Type HELP for help, EXIT to exit.
MM> read joke.mm
Reading source file "joke.mm"... 18 bytes
Reading included file "joke1.mm"... 5 bytes
60 bytes were read into the source buffer.
The source has 1 statements; 0 are $a and 0 are $p.
No errors were found.  However, proofs were not checked.  Type VERIFY PROOF *
if you want to check them.
MM> quit

On the other hand, the EBNF in the book does not permit it. Here, I think that the EBNF way is preferable, and metamath.c is a bit too laxist. Thoughts ? @digama0 ?

digama0 commented 2 years ago

Yes, this is a limitation of the current implementation in metamath.c. The way it is written doesn't make it easy to fix...

benjub commented 2 years ago

@digama0 : Is this, roughly writing, because in metamath.c (and mmverify.py), FILENAME is treated by the lexer (I mean: the lexer, upon encountering '$[', immediately lexes the content of FILENAME), whereas this should be delayed (the lexer simply returns the sequence of tokens '$['; IDENT(FILENAME); '$]') and treated only by the parser ? I am writing a new verifier and I took the second approach. Is this what you would recommend ? Is this what metamath-knife does ? (the rust code is a bit hard to read for me) (I guess @tirix has ideas on this matter)

tirix commented 2 years ago

metamath-knife does a first pass to prepare for parallel parsing, which detects file includes and top-level chapters, as these are good places to split the database. Because of that, in metamath-knife, file includes are only permitted at the top level, outside of any statement or frame.

There is a note about it in parser.rs:

We require file inclusions to be between statements at the top nesting level (this has been approved by Norman Megill).

So it is actually even more restrictive that the EBNF definition.

benjub commented 2 years ago

Thanks @tirix. After a rapid look at https://github.com/david-a-wheeler/metamath-knife/blob/071292ef3a10350d0dae9e409b4921f91a08ce95/src/parser.rs it looks that lexing and parsing are kind of merged, with no clear distinction ?

Concerning my first question above, if inclusion commands are simply tokenized as '$['; IDENT(FILENAME); '$]' and the work is deferred to the parser, this makes my lexer simpler since it is then not "multifile" so does not need to keep track of the file it is reading... In progress on my side...

Actually, I think that having no reference at all to the "top nesting level" or "outermost scope" in the EBNF would make things clearer, and actually even easier to code. But I haven't thought about multithreading.

digama0 commented 2 years ago

I don't think you need to handle "outermost scope" at the parser level, but it is totally possible to do so - this is a context-free property. I think the majority of metamath verifiers treat $c scoping at verification time, not parse time: at verification time you have a stack of frames so it's easy to tell if you are at the top level.

tirix commented 2 years ago

it looks that lexing and parsing are kind of merged

You could say so, in the sense that the parser.rs module handles both lexing and parsing. The lexing part is inside Scanner's get_raw function, which isolates a single token and creates a span, which is just noting its beginning and end, without copying.

Handling file inclusion at the parsing level seems sensible to me, and I would also agree to to forbid in our standard the "jokes" you're referring to. I've now had a look at the Book, and indeed it states in chapter 4.1.2:

A file inclusion command consists of $[ followed by a file name followed by $]. It is only allowed in the outermost scope (i.e., not between ${ and $}) and must not be inside a statement (e.g., it may not occur between the label of a $a statement and its $.).

The specification would have to be relaxed here, for the "outermost scope" part.

tirix commented 2 years ago

...so to come back at the initial title of this issue, metamath.exe does not seem to be compliant with the Book here. Appendices are usually non-normative, and the actual specification shall be in the main chapters.