skvadrik / re2c

Lexer generator for C, C++, Go and Rust.
https://re2c.org
Other
1.05k stars 169 forks source link

Wrong result only if another rule is present #57

Closed skvadrik closed 9 years ago

skvadrik commented 9 years ago

The following program produces the wrong result

3: \baaa

If the first rule [<]... is removed, the correct result is produced:

8:

This could be reproduced with re2c 0.13.7.5 and 0.14.1. Versions 0.13.5 and 0.13.6 work fine.

(This is a reduced testcase. The problem was discovered in a more complicated scanner in libcmark.)

#include <stdio.h>

int scan(const char *p)
{
#define YYCTYPE char
    const char *YYCURSOR = p;
    const char *YYMARKER;

/*!re2c
    re2c:yyfill:enable = 0;

    reg_char     = [a];
    escaped_char = [\\][b];

    [<] ([x] | escaped_char | [y])* [>] { return YYCURSOR - p; }
    (reg_char | escaped_char)* { return YYCURSOR - p; }
    . { return 0; }
*/
}

int main()
{
    const char *str = "aaa\\baaa";
    int res = scan(str);
    printf("%d: %s\n", res, str + res);
    return 0;
}

Original comment by: nwellnhof

skvadrik commented 9 years ago

Original comment by: skvadrik

skvadrik commented 9 years ago

Reproduced, looking into it.

Original comment by: skvadrik

skvadrik commented 9 years ago

Since 0.13.7, re2c tries to merge common suffixes in regexps. It's a bug in the merging algorithm.

Anothe rule just re-uses 'escaped_char', which triggers the error.

Original comment by: skvadrik

skvadrik commented 9 years ago

Fixed, see https://sourceforge.net/p/re2c/code-git/ci/1a97c678d5ae0dac02234ee65d9bd847dd43c449

Will soon release 0.14.2 with this bugfix included.

If the larger example is open source code, it will be great to add it to re2c test collection.

Original comment by: skvadrik

skvadrik commented 9 years ago

Original comment by: skvadrik

skvadrik commented 9 years ago

Released re2c-0.14.2 with bugfix.

Could you send me your real-world example? I will add it to re2c test collection (and it won't break next time).

Original comment by: skvadrik

skvadrik commented 9 years ago

Is this it?

https://github.com/jgm/cmark/blob/master/src/scanners.re

Original comment by: starseeker

skvadrik commented 9 years ago

Very likely, thanks!

Original comment by: skvadrik

skvadrik commented 9 years ago

Thanks for the quick response. Version 0.14.2 fixes the issue. The link provided by Cliff is correct.

Original comment by: nwellnhof