jhjourdan / C11parser

A correct C89/C90/C99/C11/C18 parser written using Menhir and OCaml
Other
192 stars 17 forks source link

Add grammar rule to accept adjacent string literals #27

Closed zmajeed closed 4 months ago

zmajeed commented 4 months ago

Also fix whitespace_char_no_newline pattern to exclude newline \012

jhjourdan commented 4 months ago

Thanks for your proposal.

I patched the parser to include string merging. I initially did not include that as part of the parser, because the C standard presents this as a pre-processing phase before parsing, and I wanted to have a grammar as close as possible as the grammar in the standard. That being said, it is natural to integrate this as part of the parser, and this is a small local change, which is easy to revert and document. So I changed my mind.

About white spaces: in OCaml (and OCamllex), the escape sequence \012 has to be interpreted as a decimal escape sequence. So \012 corresponds to "form feed", and not "newline". So I don't agree with your proposed change. On the other hand, I added \011 (vertical tab) as a whitespace character, as asked by the standard.

zmajeed commented 4 months ago

Thanks for making the changes

btw - the grammar has held up very well against real C sources The biggest file I've tested is the amalgamated sqlite3.c - https://www.sqlite.org/amalgamation.html. It's 6.5 MB after preprocessing, 178,000 LOC including comments. I did add some GCC extensions to the grammar that have no impact on portions that are standard C.

fpottier commented 4 months ago

btw - the grammar has held up very well against real C sources The biggest file I've tested is the amalgamated sqlite3.c

Interesting and good to know, thanks!