westes / flex

The Fast Lexical Analyzer - scanner generator for lexing in C and C++
Other
3.54k stars 529 forks source link

Potential out-of-bounds access of `yy_nxt` #627

Closed szhorvat closed 6 months ago

szhorvat commented 6 months ago

When I use the -Cf option with a certain scanner, I get the following warning from UndefinedBehaviourSanitizer:

runtime error: index 208 out of bounds for type 'const flex_int16_t[128]' (aka 'const short[128]')

The warning is triggered on a line like this:

while ( (yy_current_state = yy_nxt[yy_current_state][ YY_SC_TO_UI(*yy_cp) ]) > 0 )

It seems that even though yy_nxt is defined as flex_int16_t yy_nxt[][128], it is accessed as yy_nxt[i][208], with an index beyond 127. This does not actually lead to a crash, as yy_nxt[i+1] also happens to contain data, so yy_nxt[i][208] is effectively equivalent to yy_nxt[i+1][80].

Is this quirky way of indexing intentional, or is it a bug in the code that Flex generates?


Flex version is 2.6.4. The scanner sources are at https://github.com/igraph/igraph/blob/f5a2860995dc85643b4d63f47ff0aa26515dc1f8/src/io/gml-lexer.l

This happens only when I use the -Cf option, and pass specific data to the scanner. It happens to perform very badly on this data without using this option.

Mightyjo commented 6 months ago

Great question. Flex scanners can do a bunch of things that irritate Clang. This is one of the weirder ones.

Flex allocates an extra row in several of its internal tables so it can use some pointer math shortcuts without clobbering memory. The scanners switch between treating yy_nxt as a 1D and 2D array without much concern. It's functionally correct and even safe thanks to the extra table padding, but it makes strict type checking mad. Took me ages to figure out why it's written this way.

szhorvat commented 6 months ago

So, in short, not a bug? I did not notice any incorrect behaviour. I just need to make sure that I can safely use -Cf despite this UBSan warning.

westes commented 6 months ago

Compiler warnings are not errors.

szhorvat commented 6 months ago

This is not a compiler warning. It is an issue detected at runtime by UndefinedBehaviourSanitizer.

westes commented 6 months ago

Flex doesn't always follow Strunk & White. Generic warnings that code isn't pretty are just that. We know. At some point we may clean this kind of thing up, but we'd need a ton of test cases to ensure that flex will still do what flex has always done.

If you have an example where flex behavior is incorrect, please post a test case.

szhorvat commented 6 months ago

@westes Could you please address the specific situation I describe? I am concerned that you may have misunderstood my question as being a request to live up to some arbitrary code quality standards. It is not that. Flex generated code is accessing a statically allocated array in the following way:

int arr[2][2] = {
    {1,2},
    {3,4}
};

arr[0][3]; // notice the index 3 going above the upper bound of 2

I have only seen this in one very specific situation even though I regularly test this code with UndefinedBehaviourSanitizer, which at least makes this behaviour suspicious.

My question is simply: Are you aware of doing this type of non-standard indexing intentionally within Flex?

westes commented 6 months ago

Please reread @Mightyjo's comment very carefully. He addresses your concern precisely and thoroughly.