westes / flex

The Fast Lexical Analyzer - scanner generator for lexing in C and C++
Other
3.6k stars 538 forks source link

flex >2.6.4 won't work with gnome-commander #264

Open rffontenelle opened 7 years ago

rffontenelle commented 7 years ago

Since flex 2.6.4, gnome-commander's build fails when flex is called by ylwrap. See error output:

test -f gnome-cmd-advrename-lexer.cc || /bin/sh ../ylwrap gnome-cmd-advrename-lexer.ll lex.yy.c gnome-cmd-advrename-lexer.cc -- flex  
/usr/bin/m4:stdin:1511: ERROR: end of file in string
make[3]: *** [Makefile:852: gnome-cmd-advrename-lexer.cc] Error 1

This problem doesn't happen with flex 2.6.3, but also happens in master branch.

Not sure whether this is something to be fixed in flex or in another software, but gnome-commander's developer doesn't have this version of flex, and therefore can't reproduce it (BGO#785505). So, I'd be glad if someone could shed light on this.

Environment:

Steps to reproduce:

  1. Put the ylwrap and gnome-cmd-advrename-lexer.ll in same folder
  2. Run:
    /bin/sh ylwrap gnome-cmd-advrename-lexer.ll lex.yy.c gnome-cmd-advrename-lexer.cc -- flex

    Files: For your convenience, I attached the files required for running the above command:

Explorer09 commented 7 years ago

I can reproduce this. "ylwrap" is not needed for reproducing the bug. Instead, this will do:

$ ./flex --verbose gnome-cmd-advrename-lexer.ll
./flex version 2.6.4 usage statistics:
  scanner options: -vI8 -Cem
  191/2000 NFA states
  75/1000 DFA states (278 words)
  12 rules
  Compressed tables always back-up
  1/40 start conditions
  87 epsilon states, 57 double epsilon states
  67/100 character classes needed 329/500 words of storage, 0 reused
  1295 state/nextstate pairs created
  140/1155 unique/duplicate transitions
  83/1000 base-def entries created
  243/2000 (peak 455) nxt-chk entries created
  48/2500 (peak 336) template nxt-chk entries created
  4 empty table entries
  9 protos created
  8 templates created, 30 uses
  42/256 equivalence classes created
  6/256 meta-equivalence classes created
  0 (0 saved) hash collisions, 68 DFAs equal
  0 sets of reallocations needed
  950 total table entries needed
/home/explorer/toolchain/tools/bin/m4:stdin:1511: ERROR: end of file in string
Explorer09 commented 7 years ago

git bisect shows that commit ba530cd52fa2d69ddf7194459445a19fc9648014 is the culprit. I've done the most I can. I don't know how to fix this at the moment.

westes commented 7 years ago

Thanks for this report. I'm tagging it for 2.6.6 so that I can focus on getting 2.6.5 out the door.

HBBroeker commented 6 years ago

The problem is with line 251 of that input file:

\$[cxXegnNp]\([^\)]*\)? ECHO; // don't substitute broken $x tokens like $x(-1), $x(abc) or $x(abc

The apostrophe in "don't" gets interpreted as the beginning of a character constant, but that never ends, so triggeres the condisition added by commit ba530cd

IMHO this constitutes a bug in that source, not in flex, because it relies on C++ format comment handling.

turboscholz commented 6 years ago

Thanks for the hint, I will change that in the sources of Gnome Commander. 👍

Explorer09 commented 6 years ago

I think it's also a bug in flex too, since everything after the regex and the space should have been interpreted as C code without checking its syntax (except for braces marking multi-line code). Good job on finding a workaround, though.

HBBroeker commented 6 years ago

Yeah well, the question is how do you find out how much of the input file is "everything after the regex and the space", in the face of an arbitrary mix of braces, quotes, backslashes and C comment characters. The change in flex that triggered the breakage tried to change the parsing of brace pairs for multi-line code. It probably should not have affected the parsing of line 251, as there's no brace in there ... but it did.

The problem can be solved either by replacing "don't" by "do not", or by changing the // C++-style comment into a / C-style comment / which shows that it really is an input parsing problem.

Explorer09 commented 6 years ago

I see at least two problems here:

  1. That the C character literal is recogised while Flex is in the state that should pass it to generated code.
  2. That Flex did not recognise C one-line comments (so didn't yet know to ignore "// }" the braces after the comment mark.)
Explorer09 commented 6 years ago

Correction to my point #1: Flex may need to recognise character literals and string literals in order to avoid false positives like "}" or '}'. Sorry for not realising this earlier. However, C literals do not span through multiple lines unless they have all end-of-line marks escaped within.