p4lang / p4analyzer

A Language Server Protocol (LSP) compliant analyzer for the P4 language
Apache License 2.0
19 stars 3 forks source link

Clean up suspicious preprocessing logic #21

Open viluon opened 1 year ago

viluon commented 1 year ago

The preprocess method of PreprocessorState has some dubious control flow. https://github.com/p4lang/p4analyzer/blob/8056ffcfc3cbee609d84653549c37d439d000479/crates/analyzer-core/src/preprocessor.rs#L277

It looks like it would report a recursive import error if a single file were included non-recursively from multiple locations. The check for that should compare the vertex state of the included file against VertexState::Open, not just see whether it has any state at all (i.e. whether it was seen yet). To make that work, however, preprocess() needs to manage the vertex states reasonably, which it doesn't presently do. The top of the while loop checks whether the incoming token comes from a different file than the tokens up to this point. This is not enough. We need a proper stack of #includes here, because the input stream can mix tokens from different source files, and a change in the source doesn't imply that we're done with said source. The #include stack should correspond to the path of FileId's for the resolved output tokens. We only associate a single FileId and Span with each token at the moment.

Currently, the while loop leaves the last processed file in the open vertex state, which is incorrect (and a possible concern for caching).

AndrewF001 commented 1 year ago

Recursively including files isn't a compiler error but the preprocessor does have a depth limit, here's example code. I also agree that the current code will flag an error if a header file is included twice, which is why we need to support #ifndef and etc in enum PreprocessorDirective, as stated in 6.2 specification.

Proposed solution would be to recursively call preprocessor() on included files with a depth limiter. Just lexing and pasting it into the result vector isn't enough, as it's depended on past #define and checking for redefinition errors.