Open jnfoster opened 2 years ago
There is another way of doing this. Rewriting the parser to be a top-down parser which will give a better control over even error messages. I don't think there is a modern C/C++ front-end which even uses bison these days even. I am actually working on my own front-end to parse P4-16 which could be reused for this. I think Marvell is willing to open source it and contribute it if wanted. The AST is different from the reference compiler and is still rough round the edges right now even. The tokenizer just does identifiers and then does looks while parsing if it is a type or not. I have been thinking about how to add namespace/modules support (it should not be hard; there is one case where there is an issue dealing parsing of "(type." vs "(type<>)" which will needed to be extended for modules but should not be hard).
[Sorry if this is a duplicate; I swear I posted this a while back, but now I don't see it.]
As background: P4_16's Bison parser relies the "lexer hack." That is, the parser maintains a symbol table that records, for each identifier, whether it is the name of a type or just an ordinary name. This symbol table is consulted by the lexer to produce two distinct tokens
IDENTIFIER
andTYPE_IDENTIFIER
. And these tokens are treated differently in the parser.Currently, the P4_16 front-end receives the entire program -- i.e., after the C preprocessor has run. And the declarations are processed from start to end. So, per language in the spec,
the front-end can determine which identifiers denote types and which ones do not.
If we are developing a system where smaller program pieces are processed, we may need to refactor the front-end. Currently, if you point the parser at a file that starts like this:
you'll get a syntax error, because
packet_in
, which is declared incore.p4
is not known to be a type.The problem may persist even if we imagine a syntax like this:
unless a side-effect of parsing the tokens
import core
causes the symbol table in the parser to realize thatcore.packet_in
is now a type.This is not a show-stopping issue, but there are some details to work out. For instance, one approach could be to write a lighter-weight front-end to parse and analyze just the
import
statements, and topologically sort them into the right order so references are known when they are encountered. (But there are questions about when the C pre-processor runs.) Alternatively, we could interrupt parsing when we get to animport
and go off and actually parse and load the referenced module. (But that makes parsing even more effectful, and we also need to be super careful about introducing loops!) And probably there are other solutions...What we can't do, is to load files using the existing parser (or a small extension to it) because the "lexer hack" means we don't even get an AST for program pieces like the snippet above.