westes / flex

The Fast Lexical Analyzer - scanner generator for lexing in C and C++
Other
3.54k stars 529 forks source link

Issue 609: Fix wrong example regex for C- and C++-style comments #614

Closed zmajeed closed 7 months ago

zmajeed commented 8 months ago

This fixes issue 609, https://github.com/westes/flex/issues/609

The regex example in section A.4.3 Quoted Constructs of the Flex manual is wrong for both C- and C++-style comments

("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)}

The C portion of the regex fails to recognize valid C comments like /* text ** more text */ and fails to reject invalid comments like /* comment 1 **/ invalid missing comment start */

Also the C++ portion of the regex fails to recognize valid C++ multiline comments containing escaped newlines after comment start like

// multiline \
comment 1

The correct regex for C and C++ comments is

("/*"([^*]|"*"[^*/])*"*"+"/")|("/"(\\\n)*"/"((\\\n)|[^\n])*)

or

(?x: ( "/*" ( [^*] | "*"+ [^*/] )* "*"+ "/" ) | ( "/" (\\ \n)* "/" ( (\\ \n) | [^\n] )* ) )