Closed szhorvat closed 9 months ago
Great question. Flex scanners can do a bunch of things that irritate Clang. This is one of the weirder ones.
Flex allocates an extra row in several of its internal tables so it can use some pointer math shortcuts without clobbering memory. The scanners switch between treating yy_nxt as a 1D and 2D array without much concern. It's functionally correct and even safe thanks to the extra table padding, but it makes strict type checking mad. Took me ages to figure out why it's written this way.
So, in short, not a bug? I did not notice any incorrect behaviour. I just need to make sure that I can safely use -Cf
despite this UBSan warning.
Compiler warnings are not errors.
This is not a compiler warning. It is an issue detected at runtime by UndefinedBehaviourSanitizer.
Flex doesn't always follow Strunk & White. Generic warnings that code isn't pretty are just that. We know. At some point we may clean this kind of thing up, but we'd need a ton of test cases to ensure that flex will still do what flex has always done.
If you have an example where flex behavior is incorrect, please post a test case.
@westes Could you please address the specific situation I describe? I am concerned that you may have misunderstood my question as being a request to live up to some arbitrary code quality standards. It is not that. Flex generated code is accessing a statically allocated array in the following way:
int arr[2][2] = {
{1,2},
{3,4}
};
arr[0][3]; // notice the index 3 going above the upper bound of 2
I have only seen this in one very specific situation even though I regularly test this code with UndefinedBehaviourSanitizer, which at least makes this behaviour suspicious.
My question is simply: Are you aware of doing this type of non-standard indexing intentionally within Flex?
Please reread @Mightyjo's comment very carefully. He addresses your concern precisely and thoroughly.
When I use the
-Cf
option with a certain scanner, I get the following warning from UndefinedBehaviourSanitizer:The warning is triggered on a line like this:
It seems that even though
yy_nxt
is defined asflex_int16_t yy_nxt[][128]
, it is accessed asyy_nxt[i][208]
, with an index beyond 127. This does not actually lead to a crash, asyy_nxt[i+1]
also happens to contain data, soyy_nxt[i][208]
is effectively equivalent toyy_nxt[i+1][80]
.Is this quirky way of indexing intentional, or is it a bug in the code that Flex generates?
Flex version is 2.6.4. The scanner sources are at https://github.com/igraph/igraph/blob/f5a2860995dc85643b4d63f47ff0aa26515dc1f8/src/io/gml-lexer.l
This happens only when I use the
-Cf
option, and pass specific data to the scanner. It happens to perform very badly on this data without using this option.