llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.01k forks source link

`-Warray-bounds` Regression with `static const` variables #62208

Open Dylan-Brotherston opened 1 year ago

Dylan-Brotherston commented 1 year ago

The following code

static const int LENGTH = 10;

int main(int argc, char **argv) {
    char array1[10];
    char array2[LENGTH];
    char array3[LENGTH + 1];

    array1[5] = 'A';
    array2[5] = 'B';
    array3[5] = 'C';

    array1[20] = 'A';
    array2[20] = 'B';
    array3[20] = 'C';
}

produces -Warray-bounds warnings for all three arrays with Clang 11

$ clang-11 test.c

test.c:12:5: warning: array index 20 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    array1[20] = 'A';
    ^      ~~
test.c:4:5: note: array 'array1' declared here
    char array1[10];
    ^
test.c:13:5: warning: array index 20 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    array2[20] = 'B';
    ^      ~~
test.c:5:5: note: array 'array2' declared here
    char array2[LENGTH];
    ^
test.c:14:5: warning: array index 20 is past the end of the array (which contains 11 elements) [-Warray-bounds]
    array3[20] = 'C';
    ^      ~~
test.c:6:5: note: array 'array3' declared here
    char array3[LENGTH + 1];
    ^
3 warnings generated.

But only processes a warning for array1 with Clang 12 -> Clang 16

$ clang-12 test.c

test.c:12:5: warning: array index 20 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    array1[20] = 'A';
    ^      ~~
test.c:4:5: note: array 'array1' declared here
    char array1[10];
    ^
1 warning generated.

But if the arrays are global variables the bounds checking returns

static const int LENGTH = 10;

char array1[10];
char array2[LENGTH];
char array3[LENGTH + 1];

int main(int argc, char **argv) {

    array1[5] = 'A';
    array2[5] = 'B';
    array3[5] = 'C';

    array1[20] = 'A';
    array2[20] = 'B';
    array3[20] = 'C';
}

Clang 11

$ clang-11 test.c

test.c:13:5: warning: array index 20 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    array1[20] = 'A';
    ^      ~~
test.c:3:1: note: array 'array1' declared here
char array1[10];
^
test.c:14:5: warning: array index 20 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    array2[20] = 'B';
    ^      ~~
test.c:4:1: note: array 'array2' declared here
char array2[LENGTH];
^
test.c:15:5: warning: array index 20 is past the end of the array (which contains 11 elements) [-Warray-bounds]
    array3[20] = 'C';
    ^      ~~
test.c:5:1: note: array 'array3' declared here
char array3[LENGTH + 1];
^
3 warnings generated.

Clang 12 -> Clang 16

$ clang-12 test.c

test.c:4:6: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
char array2[LENGTH];
     ^
test.c:5:6: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
char array3[LENGTH + 1];
     ^
test.c:13:5: warning: array index 20 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    array1[20] = 'A';
    ^      ~~
test.c:3:1: note: array 'array1' declared here
char array1[10];
^
test.c:14:5: warning: array index 20 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    array2[20] = 'B';
    ^      ~~
test.c:4:1: note: array 'array2' declared here
char array2[LENGTH];
^
test.c:15:5: warning: array index 20 is past the end of the array (which contains 11 elements) [-Warray-bounds]
    array3[20] = 'C';
    ^      ~~
test.c:5:1: note: array 'array3' declared here
char array3[LENGTH + 1];
^
5 warnings generated.
shafik commented 1 year ago

It looks like this is specific to C: https://godbolt.org/z/d1s36zfEM

While it looks ok for C++: https://godbolt.org/z/fvKdcYdMb

shafik commented 1 year ago

I checked in Sema::CheckArrayAccess and array2 and array3 are being classified as VariableArrayType which I don't think they are:

> expr BaseExpr->getType()->dump()
VariableArrayType 0x133897240 'char[LENGTH]' variably_modified  
|-BuiltinType 0x13382dab0 'char'
`-ImplicitCastExpr 0x1338971a8 'int' <LValueToRValue>
  `-DeclRefExpr 0x133897188 'const int' lvalue Var 0x133896d40 'LENGTH' 'const int'

So they get classified as IsUnboundedArray and will skip the check all together.

CC @AaronBallman who will probably have a better idea here.

AaronBallman commented 1 year ago

Those are both VLAs in C. Surprise! See C2x 6.7.6.2p4: "... If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type."

static const int is not a valid integer constant expression per C2x 6.6p8: "An integer constant expression132) shall have integer type and shall only have operands that are integer constants, named and compound literal constants of integer type, character constants, sizeof expressions whose results are integer constants, alignof expressions, and floating, named, or compound literal constants of arithmetic type that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the typeof operators, sizeof operator, or alignof operator."

Note, we warn you about this: https://godbolt.org/z/zKxGrhzG9

It's also worth noting:

test.c:4:6: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
char array2[LENGTH];
     ^

this might be a non-conforming extension. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2713.htm was adopted for C2x and is bringing in words for the resolution to DR312. However, I say "might" because I'm in the process of writing an NB comment asking to back that change out because of the potential to break code.

Dylan-Brotherston commented 1 year ago

Interesting that this still occurs for global variables instead of just giving an error in that case.

Older versions of the clang users manual specifically referenced this behavior.

Arrays that are VLA’s according to the standard, but which can be constant folded by the frontend are treated as fixed size arrays. This occurs for things like int X[(1, 2)];, which is technically a VLA. c * modes are strictly compliant and treat these as VLAs.

but it has been removed from the current version. and despite being an extension it doesn't appear to be disabled when -std=c2x is used

static const int LENGTH = 10;

char array1[10];
char array2[LENGTH];
char array3[LENGTH + 1];

int main(int argc, char **argv) {

    array1[5] = 'A';
    array2[5] = 'B';
    array3[5] = 'C';

    array1[20] = 'A';
    array2[20] = 'B';
    array3[20] = 'C';
}
$ clang-15 -std=c2x test.c

test.c:4:6: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
char array2[LENGTH];
     ^
test.c:5:6: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
char array3[LENGTH + 1];
     ^
test.c:16:5: warning: array index 20 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    array1[20] = 'A';
    ^      ~~
test.c:8:5: note: array 'array1' declared here
    char array1[10];
    ^
3 warnings generated.
Dylan-Brotherston commented 1 year ago

If i am reading N2713 correctly then the wording

An implementation may accept other forms of constant expressions.

already exists in the standard.

would that not allow clang to have static const int as a valid integer constant expression?

Or is my reading completely backwards?

AaronBallman commented 1 year ago

You're correct that the standard used to say that implementations can accept other forms of constant expressions, and implementations (like ours and several others) took that to mean constant expressions in general. https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_312.htm was filed and touched on this and the committee was put on record as the intent being "other forms, except integer constant expressions". 15+ years later, N2713 then took that intent and proposed wording for it, which was adopted by the committee.

Dylan-Brotherston commented 1 year ago

I see, so N2713 is the document that explicit disallows conforming compilers from doing that Clang 11 was doing.

AaronBallman commented 1 year ago

I see, so N2713 is the document that explicit disallows conforming compilers from doing that Clang 11 was doing.

That's correct!

efriedma-quic commented 1 year ago

Older versions of the clang users manual specifically referenced this behavior.

Arrays that are VLA’s according to the standard, but which can be constant folded by the frontend are treated as fixed size arrays. This occurs for things like int X[(1, 2)];, which is technically a VLA. c * modes are strictly compliant and treat these as VLAs.

but it has been removed from the current version.

We stopped folding the cases that have visibly different behavior in standard-compliant code; see https://reviews.llvm.org/D89523. We continue to perform constant folding in contexts where VLAs aren't legal, like global variable declarations.

Dylan-Brotherston commented 1 year ago

Just to keep this issue updated. It appears that @AaronBallman has filed N3125 in response to N2713, to request that the standards committee reverse this wording change.

Dylan-Brotherston commented 1 year ago

As a further question on this topic, am i correct in my understanding that the C standard only informs how code gen should be handled. And not, say, diagnostics.

If Clang generated assembly in these situations as if the arrays are VLAs but continued to emit diagnostics as if the arrays were a fixed size would that case Clang to not be in conformance with the standard?

AaronBallman commented 1 year ago

As a further question on this topic, am i correct in my understanding that the C standard only informs how code gen should be handled. And not, say, diagnostics.

99.9% correct -- the standard doesn't mandate much about diagnostics except for a few special cases (must issue at least one diagnostic if there's a violation of a constraint, handling of static_assert or #error/#warning). The rest is left up to QoI (quality of implementation).

If Clang generated assembly in these situations as if the arrays are VLAs but continued to emit diagnostics as if the arrays were a fixed size would that case Clang to not be in conformance with the standard?

That sounds a bit like the worst of both worlds, unless I'm misunderstanding something. The user gets a diagnostic telling them "this could be constant folded, but we're not allowed to" and they get the runtime overhead of a VLA. I think it would be better if we could just constant fold as an extension. The user still gets told "this isn't portable" (because of the extension warning) but they get the better performance and diagnostic checking from a constant-sized array.

efriedma-quic commented 1 year ago

The -Warray-bounds warning isn't dictated at all by the standard, so we can generate it whenever we want. (Accessing outside the bounds of an array is only undefined behavior if the access is actually executed at runtime.)

We need to distinguish VLA vs non-VLAs for a couple of places where the standard specifically calls out variably modified types. Primarily goto, and sizeof.