This repository is meant to incrementally implement a FreePascal 3 parser and code browser as it is e.g. available in https://github.com/rochus-keller/LisaPascal.
It turned out that there is no formal EBNF grammar for the current FreePascal 3.2.2 version. The closest I found was this because of a hint from here, but it turned out to be incomplete and too different from the current language specification; I therefore manually transcribed all syntax diagrams from the latter.
The given syntax was gradually migrated to an LL(1) compatible grammar using EbnfStudio; from that a parser can be generated. The grammar required ~40 LL(2) prefixes to cope with ambiguities.
The lexer was adapted/extended from the LisaPascal project and successfully tested with the FPC 3.2.2 source tree (even detected issues in the FPC source code). The preprocessor works as far as required by the FP compiler source code.
The parser is automatically generated using the new EbnfStudio C++ generator and pseudo keyword features; all 667 regular .pas and .pp files of the FP 3.2.2 compiler source tree (which in addition include 191 *.inc files) can be successfully parsed in 5.4 seconds on my 2009 EliteBook (though the following had syntax errors and were fixed: widestr.pas, oglx.pas, symcpu.pas, nppcadd.pas, nppcld.pas and nppcmat.pas).
The parser also generates a syntax tree per file which requires 8.3 seconds for all 667 file of the FP compiler source tree (i.e. ~54% slower than the parser alone).
The code browser was adopted from the LisaPascal project with most modifications concentrated in the code model. The FP code model is using a visitor generated by EbnfStudio. The 539'144 SLOC of the FP compiler source tree (counting each include) are parsed and analyzed in 11.7 seconds on my 2009 EliteBook (i.e. ~41% slower than parsing with syntax tree generation, and ~117% slower than only parsing).
Here is a screenshot of the code browser:
NOTE that this is work in progress.
First conclusion after a first look at the whole compiler source tree: much bigger and much more complex than expected; CLOC 1.6 counts 661 Pascal files with 340 kSLOC (in comparison, CLOC counts 469 C files with 344 kSLOC for the Mono 5 CLR); this doesn't include the RTL, on which the compiler source code depends, and for which CLOC counts yet another 1385 Pascal files and 577 kSLOC; the huge language with all the special cases to be compatible with the different language variants takes its toll; I wonder how on earth the authors are able to manage this gigantic code base in an open source development approach without the support of a big company; it can also be recognized that different authors have preferred different architectural concepts, although I am only at the beginning here; because of the preprocessor, in addition to Pascal's own modularization, there is another via conditional compilation and includes; the backends for the different processor architectures can apparently not be built in the same build, but you see a different source tree depending on the defines; at the moment I am not very confident that the compiler backend can be reused with reasonable effort; the FP compiler including runtime library are an impressive monumental work and likely not to be coped with by a single person.