Closed gitamohr closed 1 year ago
Do you get the same error if you misspell a character class name anywhere else in your lexer?
-Joe
On Fri, Feb 17, 2023, 15:36 Alex Mohr @.***> wrote:
The following input to flex 2.6.4 gives an m4 error:
%% A { return 'A'; } /*
- Bug: [[:alnum:]_] */ %%
flex bug.ll /bin/m4:stdin:1315: ERROR: end of file in string
The error disappears if I remove the underscore character from the comment, like * Bug: [[:alnum:]]
— Reply to this email directly, view it on GitHub https://github.com/westes/flex/issues/553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVJXIKYDWC3AZANTVNHR2TWX7OM3ANCNFSM6AAAAAAU73CTI4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Well, the contents of a comment should not affect the output. But FWIW using that char class in the grammar works fine:
%%
[[:alnum:]_] { return 'Z'; }
/*
* Bug: [[:alnum:]]
*/
%%
(If I add the underscore back into the comment like [[:alnum:]_]
the error returns.)
Okay, that's weird. I notice you named the file .ll. Is that for c++? (Shouldn't matter, but weird is weird.) Any command line switches or other options needed to reproduce this?
On Fri, Feb 17, 2023, 17:58 Alex Mohr @.***> wrote:
Well, the contents of a comment should not affect the output. But FWIW using that char class in the grammar works fine:
%% [[:alnum:]_] { return 'Z'; } /*
- Bug: [[:alnum:]] */ %%
(If I add the underscore back into the comment like [[:alnum:]_] the error returns.)
— Reply to this email directly, view it on GitHub https://github.com/westes/flex/issues/553#issuecomment-1435372982, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVJXIJK3DB2HXS3LM3CWIDWX77CXANCNFSM6AAAAAAU73CTI4 . You are receiving this because you commented.Message ID: @.***>
No that's just what the file happens to be named. I renamed it to to bug.l. I'm invoking flex with no arguments other than the input file. I built flex 2.6.4 into /usr/local by ./configure && make && sudo make install
. Here's a complete terminal session repro. I tried to reduce the repro as much as I could:
> cat bug.l
%%
A { return 'A'; }
/*
* Bug: [[:alnum:]_]
*/
%%
> flex bug.l
/bin/m4:stdin:1315: ERROR: end of file in string
> flex -V
flex 2.6.4
And just to say it, this isn't just a spurious error; flex's output is truncated and invalid. I can work around it by modifying my comments, but it seems like a bona fide bug in flex's comment handling, so I wanted to report it.
Here's a slightly more reduced repro. This fails:
%%
A { return 'A'; }
/* [[:alnum:]_] */
%%
This works:
%%
A { return 'A'; } /* [[:alnum:]_] */
%%
There is something crucial about having additional characters after the [:alnum:]
character class expression in the comment. Having characters before (like [_[:alnum:]]
) works fine. Also the particular characters that follow don't seem to matter, I've tried whitespace, letters, digits, special chars. Also the character class expression name doesn't matter -- I've tried :alpha:
, :digit:
, and even :bogus:
and they all repro.
First, sorry for saying [[:alnum:]_] was a misspelling. I was holding on to a false notion that character class names in flex included the outer square braces. Probably because of the next thing.
Second, I found the problem but I can't fix it right now. It's peculiar to comment handling, as you noticed. Flex wraps comments in its customized M4 quotes, which happen to be [[ and ]]. Because the character classes aren't being scanned and replaced in the comments, M4 is reading the braces around them as quotation marks. This is usually okay when they are balanced (i.e. [[:alnum:]]). It leads to the error you saw when they look like unbalanced quotes (i.e. [[:alnum:]_]).
Options:
Sorry the comment quoting makes this edge case complicated.
No worries -- thanks for taking a look. I can easily work around this. For what it's worth, this example works in version 2.5.39, so the bug was introduced somewhere between 2.5.39 and 2.6.4.
Drat! Now it's a bug instead of an oddity.
On Mon, Feb 27, 2023, 15:38 Alex Mohr @.***> wrote:
No worries -- thanks for taking a look. I can easily work around this. For what it's worth, this example works in version 2.5.39, so the bug was introduced somewhere between 2.5.39 and 2.6.4.
— Reply to this email directly, view it on GitHub https://github.com/westes/flex/issues/553#issuecomment-1447051319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVJXILFHZCXTVKI33P6NLLWZUGDLANCNFSM6AAAAAAU73CTI4 . You are receiving this because you commented.Message ID: @.***>
Think I found it. Mainly for my reference when writing a test & patch: we aren't escaping m4qstart and m4qend in the COMMENT_DISCARD condition the same way we are in COMMENT. I think that's the source of this. I'll write tests based on the cases above, thanks for those!
Nope, none of that worked.
@gitamohr, exactly what example did you test in 2.5.39? I'm trying to reproduce a working test from your comments above and finding no differences between 2.5.39 and HEAD.
%% g {; } / after action comment [[:alnum:]_] / h {; } / after action comment [[:alnum:]_] / %%
Flex accepts g but dies on h.
Here's what's up: The comment after the h action is scanned as ... I don't know what. Could be a comment, could be an action. Looks like it just gets echoed a byte at a time either way.
However! The following construction works for long comments in 2.5.39 and HEAD:
i {; } /*
Outcomes: I'm adding tests for multiline comments with unmatched braces to tests/quotes.l. I'll include the g and i constructions only for now.
I just tried the shortest example from above:
> cat bug.l
%%
A { return 'A'; }
/* [[:alnum:]_] */
%%
> flex bug.l
/bin/m4:stdin:1315: ERROR: end of file in string
> flex -V
flex 2.6.4
> /old/flex bug.l
> /old/flex -V
flex 2.5.39
cheers!
In this thread: I show myself to be an idiot. I have my trusty, old "2.5.39" folder connected to the 2.6.4 tag for some reason.
Beg your pardon. Be back with better results shortly.
Well, I'm back where we started. I see the issue, but I can't fix it for a while.
> cat bug.l %% A { return 'A'; } /* [[:alnum:]_] */ %%
Flex sees the comment after A's action as a "CODE_COMMENT". Those aren't m4 quoted the same way as other comments because quoting them cause other problems. Until we get rid of the m4 dependency, I can't change the behavior back to what you came to expect in 2.5.39 without breaking other functionality.
That said, you can use the constructions I provided above instead. I'm about done with the tests for them so we'll notice before losing any more comment functionality.
Yep no worries, as I've mentioned this is no real impediment; just something I noticed.
fixed by #557
Any idea if https://stackoverflow.com/questions/78157667/error-end-of-file-in-string-error-coming-from-m4-when-using-flex is related to this?
The following input to flex 2.6.4 gives an m4 error:
The error disappears if I remove the underscore character from the comment, like
* Bug: [[:alnum:]]