skvadrik / re2c

Lexer generator for C, C++, Go and Rust.
https://re2c.org
Other
1.07k stars 169 forks source link

Is there a convenient way to get yytext? #388

Closed krishna116 closed 2 years ago

krishna116 commented 2 years ago

I want to get the matched token string, for example:

/*!re2c
    number = [0-9]+ ;
    number { printf("number: %s\n", yytext); return 1; }
    *      { return 0; }
*/

the document seems doesn't provide this api or a convenient way to get the matched token string, so what is the best way to get matched token string or I must using " @stag" ? thank you.

skvadrik commented 2 years ago

If you need the whole matched text, then you don't need tags: the match begins at start position of YYCURSOR and ends at the final position of YYCURSOR. If you want to extract submatch in the middle of input, then you need tags. See the two examples below with and without tags (in your example they are not necessary).

There is no automatic yytext because re2c does not on its own allocate memory and create copies of the input text (this would be too expensive, as the user often doesn't need the copy). If you need a copy, you can easily create one as std::string s(x, y) where x and y are the pointers in the input text (see below).

Example without tags:

int lex(const char *str) {
    const char *YYCURSOR = str;

    /*!re2c
    re2c:define:YYCTYPE = char;
    re2c:yyfill:enable = 0;

    number = [0-9]+;
    number {
        // just print
        printf("number: %.*s\n", (int)(YYCURSOR - str), str);

        // save into an std::string
        std::string s(str, YYCURSOR);

        return 1;
    }
    * { return 0; }

    */
}

With tags:

int lex(const char *YYCURSOR) {
    const char *x, *y;
    /*!stags:re2c format = 'const char *@@;\n'; */

    /*!re2c
    re2c:define:YYCTYPE = char;
    re2c:yyfill:enable = 0;
    re2c:flags:tags = 1;

    number = [0-9]+;
    @x number @y {
        // just print
        printf("number: %.*s\n", (int)(y - x), x);

        // save into an std::string
        std::string s(x, y);

        return 1;
    }
    * { return 0; }

    */
}

Also, what document are you referring to? I don't think re2c docs mention yytext.

skvadrik commented 2 years ago

Also, what document are you referring to? I don't think re2c docs mention yytext.

Please ignore the question, I misread your initial comment as "the document does provide".

krishna116 commented 2 years ago

skvadrik, thank you very much, yet I am not very clear about YYCURSOR. for example:

#include<iostream>

int lex(const char *str) {
    const char *YYCURSOR = str;
    const char *begin = nullptr;
    for(;;)
    {
        begin = YYCURSOR;
    /*!re2c
        re2c:define:YYCTYPE = char;
        re2c:yyfill:enable = 0;

        number = [0-9]+;
        spaces = [ \t]+;

        number { std::string s(begin, YYCURSOR); printf("token = [%s], size = %d\n",s.data(),s.size());}
        spaces { std::string s(begin, YYCURSOR); printf("token = [%s], size = %d\n",s.data(),s.size());}
        * { return -1; }
    */
    }

    return 0;
}

int main()
{
    std::string str{ "1234 456" };
    lex(str.data());
    return 0;
}

the result output is not full correct, the third token has error: debug

skvadrik commented 2 years ago

You have to add continue; at the end of semantic actions (after printf). Otherwise the lexer just falls through into the next state, whatever it might be. Also s.c_str() to get the C string from an std::string is more conventional.

krishna116 commented 2 years ago

I have struggled with these problems for half a day, finally it is solved by your help, thank you again and best wishes to you.