Closed zmajeed closed 4 months ago
Thanks for your proposal.
I patched the parser to include string merging. I initially did not include that as part of the parser, because the C standard presents this as a pre-processing phase before parsing, and I wanted to have a grammar as close as possible as the grammar in the standard. That being said, it is natural to integrate this as part of the parser, and this is a small local change, which is easy to revert and document. So I changed my mind.
About white spaces: in OCaml (and OCamllex), the escape sequence \012 has to be interpreted as a decimal escape sequence. So \012 corresponds to "form feed", and not "newline". So I don't agree with your proposed change. On the other hand, I added \011 (vertical tab) as a whitespace character, as asked by the standard.
Thanks for making the changes
\012
is decimal in OCaml - wow! I thought it was working because of the rule's position in the lexer right after the check for \n
gcc -E
doesn't do it and the parser choked on adjacent string literals when testing test real-world preprocessed sourcefiles. Indeed concatenation is done in gcc/c/c-parser.cc
for its c_parser_string_literal
grammar rulebtw - the grammar has held up very well against real C sources The biggest file I've tested is the amalgamated sqlite3.c
- https://www.sqlite.org/amalgamation.html. It's 6.5 MB after preprocessing, 178,000 LOC including comments. I did add some GCC extensions to the grammar that have no impact on portions that are standard C.
btw - the grammar has held up very well against real C sources The biggest file I've tested is the amalgamated
sqlite3.c
Interesting and good to know, thanks!
Also fix
whitespace_char_no_newline
pattern to exclude newline\012