westes / flex

The Fast Lexical Analyzer - scanner generator for lexing in C and C++
Other
3.65k stars 542 forks source link

REJECT does not honor BEGIN #400

Open jklowden opened 6 years ago

jklowden commented 6 years ago

The program below produces:

$ echo foobar | ./ws
rejecting 'foobar'
foobar

because start condition B is not honored by REJECT.

The manual says REJECT

directs the scanner to proceed on to the "second best" rule

and

Note also that unlike the other special actions, 'REJECT' is a branch.

Given that it's a "branch", is it not reasonable to expect that the start condition set by BEGIN would be in effect when REJECT is evaluated? YY_START has changed; why doesn't the continued search proceed under its aegis?

I imagine flex behaves the same as Lesk's lex did. "Lex − A Lexical Analyzer Generator" describes start conditions and REJECT, and doesn't explicitly describe their interaction. At a minimum, I suggest the manual be clarified, preferably including an explanation of when start conditions become effective.

%s B
%%
foobar {
  BEGIN(B);
  printf("rejecting '%*s'\n", (int)yyleng, yytext);
  REJECT;
 }

<B>foobar {
  printf("accepting '%*s' in start condition B\n", (int)yyleng, yytext);
}
%%
zmajeed commented 5 years ago

I think using yyless as advised in this FAQ in the info file doc/flex.texi helps:

How can I make REJECT cascade across start condition boundaries?

Something like - moving the inclusive B rule above

/* cascadereject.l */
%option main
%s B
%%
 /* <B> rule must come first for inclusive B */
<B>foobar {
  printf("accepting '%s' in start condition B\n", yytext);
}

foobar {
  BEGIN(B);
  printf("rejecting '%s'\n", yytext);
  yyless(0);
 }

or - making B exclusive

/* cascadereject.l */
%option main
/* exclusive B to avoid conflict */
%x B
%%
foobar {
  BEGIN(B);
  printf("rejecting '%s'\n", yytext);
  yyless(0);
 }

<B>foobar {
  printf("accepting '%s' in start condition B\n", yytext);
}

Either way

$ flex cascadereject.l && gcc -o cascadereject lex.yy.c
$ echo foobar | ./cascadereject
rejecting 'foobar'
accepting 'foobar' in start condition B