westes / flex

The Fast Lexical Analyzer - scanner generator for lexing in C and C++
Other
3.6k stars 538 forks source link

Some situation where flex generated cpp code generates an access out-of-bounds #594

Closed fengf233 closed 10 months ago

fengf233 commented 1 year ago

flex version:2.6.4

When I compile my .l file as a cpp file using flex, I find that the following code is generated

yy_match:
        do
            {
            YY_CHAR yy_c = yy_ec[YY_SC_TO_UI(*yy_cp)] ;
            while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state )
                {
                yy_current_state = (int) yy_def[yy_current_state];
                if ( yy_current_state >= 208 )
                    yy_c = yy_meta[yy_c];
                }
            yy_current_state = yy_nxt[yy_base[yy_current_state] + yy_c];
            *yyg->yy_state_ptr++ = yy_current_state;
            ++yy_cp;
            }
        while ( yy_base[yy_current_state] != 956 );

I found this code *yyg->yy_state_ptr++ = yy_current_state; that may cause access to be out of bounds. Because that's yy_state_ptr where the memory is requested from this

        /* Create the reject buffer large enough to save one state per allowed character. */
        if ( ! yyg->yy_state_buf )
            yyg->yy_state_buf = (yy_state_type *)yyalloc(YY_STATE_BUF_SIZE  , yyscanner);
            if ( ! yyg->yy_state_buf )
                YY_FATAL_ERROR( "out of dynamic memory in yylex()" );

and YY_STATE_BUF_SIZE define this

#define YY_STATE_BUF_SIZE   ((YY_BUF_SIZE + 2) * sizeof(yy_state_type))

and YY_BUF_SIZE limit 32768

/* Size of default input buffer. */
#ifndef YY_BUF_SIZE
#ifdef __ia64__
/* On IA-64, the buffer size is 16k, not 8k.
 * Moreover, YY_BUF_SIZE is 2*YY_READ_BUF_SIZE in the general case.
 * Ditto for the __ia64__ case accordingly.
 */
#define YY_BUF_SIZE 32768
#else
#define YY_BUF_SIZE 16384
#endif /* __ia64__ */
#endif

So when I scan more than 32k of text, it coredumps

I apologize for not being able to provide my flex file, but I have found that the reason for the above code generation has to do with writing something maybe like this ([0-9]+|([0-9]*\.[0-9]+))/(a|b)

zmajeed commented 12 months ago

Could you try to reproduce with master? It sounds like the same issue fixed for https://github.com/westes/flex/issues/469

fengf233 commented 10 months ago

I haven't reproduced it on the master. Maybe same issue fixed for #469,Thanks!