westes / flex

The Fast Lexical Analyzer - scanner generator for lexing in C and C++
Other
3.56k stars 533 forks source link

Calling `yyless(n)` after `input()` causes a character duplicated. #547

Open SouravKB opened 1 year ago

SouravKB commented 1 year ago
"a" {
    input();
    yyless(0);
}

"azz" {
    printf("Never should have been matched!\n");
}

This sample scanner with input ayz should never match rule 2.

But after matching 'a' inside rule 1, calling input() returns 'y'. It increments yy_c_buf_p to point to 'z', and character at yy_c_buf_p, i.e. 'z' into yy_hold_char. But yyless macro is using outdated variable yy_cp to restore the yy_hold_char, which replaces it in a different position essentially causing duplication of character held by yy_hold_char.

Now the buffer contains azz which would erroneously match rule 2.

Mightyjo commented 1 year ago

Which flex options (if any) should we use to reproduce this?

What do you expect to be on the input stream after yyless(0)?

Thanks! This will help with adding regression tests to the suite!

SouravKB commented 1 year ago

It need not be yyless(0). Any call to yyless(n) after a call to input() will result in a bug.

In the above example, after call to yyless(0), I expect input stream to as it was before matching the rule, i.e. ayz. But due to bug, it is containing azz.

I have not used any flex options while reproducing the bug.

Mightyjo commented 1 year ago

Thanks! I understand.

Last question: did this work differently in a previously released version of flex? (Just makes a difference in how the issue is categorized.)

SouravKB commented 1 year ago

I haven't used flex much. Also, my codebase didn't contain input() followed by yylex() till today. So I can't answer that question.

Mightyjo commented 1 year ago

Copy that. No worries.

SouravKB commented 1 year ago

Glad to help. Related issue #395, in input() function. Both has to be fixed to get expected value in input stream as mentioned above. In documentation, it is not clear weather input() modifies yytext and yyleng.