skvadrik / re2c

Lexer generator for C, C++, D, Go, Haskell, Java, JS, OCaml, Python, Rust, V and Zig.
https://re2c.org
Other
1.11k stars 174 forks source link

Bug: re2go generates chained assignment which is not supported by Go #489

Closed RadhiFadlillah closed 2 months ago

RadhiFadlillah commented 2 months ago

As title said, I found a bug where re2go generates chained assignment. Given the following template:

package main

func re2goCounter(input string) int {
    var count int
    var cursor, marker int
    _ = marker

    input += string(rune(0)) // add terminating null
    limit := len(input) - 1  // limit points at the terminating null

    // Variable for capturing parentheses (twice the number of groups).
    /*!maxnmatch:re2c*/
    yypmatch := make([]int, YYMAXNMATCH*2)
    var yynmatch int
    _ = yynmatch

    // Autogenerated tag variables used by the lexer to track tag values.
    /*!stags:re2c format = 'var @@ int; _ = @@\n'; */

    for { /*!re2c
        re2c:eof              = 0;
        re2c:yyfill:enable    = 0;
        re2c:posix-captures   = 1;
        re2c:case-insensitive = 0;

        re2c:define:YYCTYPE     = byte;
        re2c:define:YYPEEK      = "input[cursor]";
        re2c:define:YYSKIP      = "cursor++";
        re2c:define:YYBACKUP    = "marker = cursor";
        re2c:define:YYRESTORE   = "cursor = marker";
        re2c:define:YYLESSTHAN  = "limit <= cursor";
        re2c:define:YYSTAGP     = "@@{tag} = cursor";
        re2c:define:YYSTAGN     = "@@{tag} = -1";
        re2c:define:YYSHIFTSTAG = "@@{tag} += @@{shift}";

        pattern = ([^0-9]);

        {pattern} { count++; continue }
        * { continue }
        $ { return count }
        */
    }
}

When compiled with the following command:

re2go -W -F --input-encoding utf8 --utf8 --no-generation-date -i $INPUT -o $OUTPUT;

Will generate the following code:

// Code generated by re2c 3.1, DO NOT EDIT.
package main

func re2goCounter(input string) int {
    // Omitted
    yy1:
        cursor++
        yynmatch = 2
        yypmatch[0] = yypmatch[2] = yyt1
        yypmatch[1] = cursor
        yypmatch[3] = yypmatch[1]
        { count++; continue }
    // Omitted
}

Unfortunately chained assignment like yypmatch[0] = yypmatch[2] = yyt1 is not supported by Go.

RadhiFadlillah commented 2 months ago

For other person who also found this issue, as workaround we can simply put an optional character and it will work nicely:

func re2goCounter(input string) int {
    ...
    for { /*!re2c
        ...
        pattern = _?([^0-9]);

        {pattern} { count++; continue }
        * { continue }
        $ { return count }
        */
    }
}
skvadrik commented 2 months ago

This has been fixed in master, now re2go no longer generates chained assignments. The fix will be part of release 4.0. It's would be very hard to backport as it's based on top of a complete rewrite of the codegen subsystem, so I'm not backporting it in 3.x.

(For benchmarking it's better to use master, as the workaround _? adds one more tag with some runtime overhead on transitions.)

RadhiFadlillah commented 2 months ago

Yes, I can confirm it's been fixed in master.

It's would be very hard to backport as it's based on top of a complete rewrite of the codegen subsystem.

No worries, fortunately re2c is pretty easy to build.

Thanks!