llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.31k stars 11.69k forks source link

When compiling C, taking the address of an expression of type void, and warning about such #60243

Open pascal-cuoq opened 1 year ago

pascal-cuoq commented 1 year ago

The following C program, when compiled with Clang 15 with -pedantic, causes the following warnings.

void *p;

void *g(void) {
  return &(p[0]);
}

The warnings:

<source>:4:13: warning: subscript of a pointer to void is a GNU extension [-Wgnu-pointer-arith]
  return &(p[0]);
           ~^
<source>:4:10: warning: ISO C forbids taking the address of an expression of type 'void' [-Wpedantic]
  return &(p[0]);
         ^ ~~~~

The second warning says, without a clause number, that ISO C forbids taking the address of an expression of type void. If this is true, then Clang should reject the following C program when -pedantic -pedantic-errors -std=c17 are set:

void *p;

void *g(void) {
  return &(*p);
}

In this case Clang 15 compiles without warning.

I don't think that ISO C forbids taking the address of an expression of type 'void' (and if I'm right, this Clang warning should not exist). But if ISO C does forbid taking the address of an expression of type 'void', then the second example should result in a warning when compiled with -pedantic -pedantic-errors -std=c17.

llvmbot commented 1 year ago

@llvm/issue-subscribers-c

AaronBallman commented 1 year ago

ISO C does forbid taking the address of an expression of type void, see C2x 6.5.3.2p1, which reads: "The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier."

But you're right that we're inconsistent in how we treat that requirement. Pedantically, we should issue the same warning for both &(p[0]) and &(*p) as the & operator is being applied to the result of a () operator, not a [] or unary * operator (remember that there can be no object of type void so there's no lvalue possibly designating an object in this case). That said, I think it's possibly an oversight by WG14 that parens are the difference between conforming and nonconforming code here, as dropping the parens should remove the warning in both cases per what I cited above.

https://godbolt.org/z/hqeP3dfKx shows we are being pedantic in one case and inconsistent in the other. I believe this is fallout from https://reviews.llvm.org/D134461, so CC @junaire for awareness.

pascal-cuoq commented 1 year ago

I'm not sure how that clause applies, since in both my examples, & is applied to “the result of a [] or unary * operator”, but that depends how informal words such as “result” are interpreted.

So in your interpretation, when C17 footnote 104 says “&E is equivalent to E (even if E is a null pointer)”, it does not implicitly mean “for any null pointer”, but only for a null pointer that does not have type `void? Because again, usually this kind of informal sentence would be construed to imply that for any well-formed null pointerp,&*pis equivalent top` and not forbidden.

AaronBallman commented 1 year ago

That clause applies because it's a constraint on what you can apply the operator to. But you're right about the interpretation of "result" -- C2x 6.5.1p5 says that a parenthesized expression has identical semantics to those of the unparenthesized form and I think you could squint at that to say the parens change nothing so the result really is from the unparenthesized expression. But those words are brand new in C2x; in C17 there is no mention about semantics for paren expressions. I'll have to dig out what paper changed that wording to see what the intent is for it.

The footnote only applies once you've gotten past the constraints, so I read it as only talking about null pointers of type other than void * because you couldn't get past the constraint if it was void *.

pascal-cuoq commented 1 year ago

I only added the parentheses because I do not personally remember the precedence of [] with respect to & and did not want to make knowing that a prerequisite to answer the ticket, but it turns out that the brackets have precedence in *p[0] and that parentheses are not necessary. This ticket can be about the behavior of Clang on *p[0] if you prefer (the behavior is the same).

AaronBallman commented 1 year ago

I think we're in agreement about this, but let's be explicitly sure. I think:

1) We should accept &p[N] without warning in pedantic mode 2) We should continue to accept &*p without warning in pedantic mode 3) I am going to try to track down why the paren wording changed in C2x, and assuming it changed to try to make paren expressions more transparent to these situations, then we should also accept &(*p) and &(p[N]) (and other forms differing only in use of parens) without warning in pedantic mode.

pascal-cuoq commented 1 year ago

Regarding 1. and 2., it is not what I would prefer, but the reason I am here is that these constructs exist in actual code that we are supposed to analyze. And GCC accepts them too. So, if only for the sake of the compatibility with GCC, yes, when p has type void*, Clang should probably accept &p[N] and &*p without warning about & (the warning about void* arithmetic should remain for the former example), and treat them as respectively equivalent to p+N and p. And we will have to accept them too, since the standard appears to be carefully worded to allow at least the second one.

Regarding 3., my opinion is that it was always the intention of the standard that &(*p) and &(p[N]) would have the same status as &*p and &p[N]. It is an indictment of past versions of the standard that it was considered necessary to clarify this in C2x, and an indictment of C2x that words have to be expended stating this. A proper language definition would not mix syntax, typing and semantic concerns in a way that leaves such a question ambiguous.

I believe that no way would remain to obtain Clang's “ISO C forbids taking the address of an expression of type 'void'” warning after these changes, and I would be curious about the input that causes it if one still exists. This would resolve the bug, which I continue to think is that the warning states something that is not true (that “ISO C forbids taking the address of an expression of type 'void'”). I admit that the warning, despite being wrong, is carefully worded. For instance, it is correct to use the word “expression” rather than “lvalue” because:

See above remark about mixing syntax, typing and semantic concerns. In an ordinary programming language, “lvalue” would be a syntactic subcategory of “expression” and not depend on typing.

AaronBallman commented 1 year ago

Regarding 1. and 2., it is not what I would prefer, but the reason I am here is that these constructs exist in actual code that we are supposed to analyze. And GCC accepts them too. So, if only for the sake of the compatibility with GCC, yes, when p has type void*, Clang should probably accept &p[N] and &*p without warning about & (the warning about void* arithmetic should remain for the former example), and treat them as respectively equivalent to p+N and p. And we will have to accept them too, since the standard appears to be carefully worded to allow at least the second one.

Agreed (it's also not what I prefer, but alas, the standard requires it and users do rely on it in practice).

Regarding 3., my opinion is that it was always the intention of the standard that &(p) and &(p[N]) would have the same status as &p and &p[N]. It is an indictment of past versions of the standard that it was considered necessary to clarify this in C2x, and an indictment of C2x that words have to be expended stating this. A proper language definition would not mix syntax, typing and semantic concerns in a way that leaves such a question ambiguous.

Personally, I agree. But having interacted with WG14 on this sort of thing in the past, I'm hesitant to claim much about the intent here without more input from the committee.

That said, I found what changed the wording for parens: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3010.htm and it seems the intent there is to clarify that a parenthesized expression is "the same thing" as the unparenthesized form, so I think we're in agreement to remove the warning from those cases as well.

I believe that no way would remain to obtain Clang's “ISO C forbids taking the address of an expression of type 'void'” warning after these changes, and I would be curious about the input that causes it if one still exists.

I think you might be right -- we seem to cover the other cases with other diagnostics, such as: https://godbolt.org/z/34z395qr1, and I've not found a case that remains yet.

MitalAshok commented 3 weeks ago

I believe that no way would remain to obtain Clang's “ISO C forbids taking the address of an expression of type 'void'” warning after these changes, and I would be curious about the input that causes it if one still exists.

https://godbolt.org/z/aTG7M7fod

extern void v;
int main(void) {
  void* x = &v;
}