Open benjub opened 2 years ago
Yes, this is a limitation of the current implementation in metamath.c. The way it is written doesn't make it easy to fix...
@digama0 : Is this, roughly writing, because in metamath.c (and mmverify.py), FILENAME is treated by the lexer (I mean: the lexer, upon encountering '$['
, immediately lexes the content of FILENAME), whereas this should be delayed (the lexer simply returns the sequence of tokens '$['; IDENT(FILENAME); '$]'
) and treated only by the parser ?
I am writing a new verifier and I took the second approach. Is this what you would recommend ? Is this what metamath-knife does ? (the rust code is a bit hard to read for me) (I guess @tirix has ideas on this matter)
metamath-knife
does a first pass to prepare for parallel parsing, which detects file includes and top-level chapters, as these are good places to split the database.
Because of that, in metamath-knife
, file includes are only permitted at the top level, outside of any statement or frame.
There is a note about it in parser.rs
:
We require file inclusions to be between statements at the top nesting level (this has been approved by Norman Megill).
So it is actually even more restrictive that the EBNF definition.
Thanks @tirix. After a rapid look at https://github.com/david-a-wheeler/metamath-knife/blob/071292ef3a10350d0dae9e409b4921f91a08ce95/src/parser.rs it looks that lexing and parsing are kind of merged, with no clear distinction ?
Concerning my first question above, if inclusion commands are simply tokenized as '$['; IDENT(FILENAME); '$]'
and the work is deferred to the parser, this makes my lexer simpler since it is then not "multifile" so does not need to keep track of the file it is reading... In progress on my side...
Actually, I think that having no reference at all to the "top nesting level" or "outermost scope" in the EBNF would make things clearer, and actually even easier to code. But I haven't thought about multithreading.
I don't think you need to handle "outermost scope" at the parser level, but it is totally possible to do so - this is a context-free property. I think the majority of metamath verifiers treat $c scoping at verification time, not parse time: at verification time you have a stack of frames so it's easy to tell if you are at the top level.
it looks that lexing and parsing are kind of merged
You could say so, in the sense that the parser.rs
module handles both lexing and parsing. The lexing part is inside Scanner's get_raw
function, which isolates a single token and creates a span, which is just noting its beginning and end, without copying.
Handling file inclusion at the parsing level seems sensible to me, and I would also agree to to forbid in our standard the "jokes" you're referring to. I've now had a look at the Book, and indeed it states in chapter 4.1.2:
A file inclusion command consists of
$[
followed by a file name followed by$]
. It is only allowed in the outermost scope (i.e., not between${
and$}
) and must not be inside a statement (e.g., it may not occur between the label of a$a
statement and its$.
).
The specification would have to be relaxed here, for the "outermost scope" part.
...so to come back at the initial title of this issue, metamath.exe
does not seem to be compliant with the Book here. Appendices are usually non-normative, and the actual specification shall be in the main chapters.
Curently, metamath.c (and mmverify.py) treat inclusion commands (
$[ FILENAME $]
) as exact copy-pasting, that is, the content of the file FILENAME is pasted in place of$[ FILENAME $]
in the input stream before lexing. This permits jokes like the following:On the other hand, the EBNF in the book does not permit it. Here, I think that the EBNF way is preferable, and metamath.c is a bit too laxist. Thoughts ? @digama0 ?