skvadrik / re2c

Lexer generator for C, C++, Go and Rust.
https://re2c.org
Other
1.09k stars 170 forks source link

Tags incorrectly set with lookahead #380

Closed SynAckFin closed 3 years ago

SynAckFin commented 3 years ago

When a tag is added to the lookahead portion of a regexp it is incorrectly set.

As an example, some of the code generated by the following rule:

@TS "a" / "bcd" @TE "efgh" { return 0; }

looks like this (comments are mine):

        yych = *++YYCURSOR;
        switch (yych) {
        case 'h':       goto yy12;   // Matched the "h" from "efgh"
        default:        goto yy6;
        }
yy12:
        ++YYCURSOR;        // Step cursor past the "h"
        YYCURSOR -= 7;     // Move cursor to beginning  of lookahead
        TS = YYCURSOR - 1; // Set TS 
        TE = YYCURSOR - 4; // Sets TE to before start of input!! It thinks the cursor is after the "efgh"
skvadrik commented 3 years ago

Confirmed, thanks for reporting. I will post a patch soon.

skvadrik commented 3 years ago

Fixed in: https://github.com/skvadrik/re2c/commit/68e1ab7160a367bf98d215fe90279cee26ec5ee8. @SynAckFin can you confirm that it works for you?

SynAckFin commented 3 years ago

Confirmed. Works in both the simple example and the more complex code I'm working on with no side effects detected.

skvadrik commented 3 years ago

Thank you! Closing the bug.