llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.89k stars 11.93k forks source link

Accessing qualified member of a structure or union rvalue allows creating qualified rvalues #96713

Open Halalaluyafail3 opened 4 months ago

Halalaluyafail3 commented 4 months ago

The following code demonstrates the issue:

struct{
    const int c;
}foo();
puts(_Generic(foo().c,int:"A",const int:"B"));

foo returns a structure that contains a qualified member named c of type const int. Calling foo results in an rvalue structure, which when accessing one of the members via . results in an rvalue. Clang does not remove qualifiers or _Atomic (C23 removes _Atomic from rvalues as well, and GCC does this in earlier versions as well) here while GCC does and the result of DR 423 seems intended to remove cases of qualified rvalues. Additionally, Clang actually generates a warning here with the default warnings stating that the const int branch will never be taken (and then it takes it anyway). The wording doesn't state that the result is the non-atomic (C23) and unqualified member when getting the member though I assume that is a defect.

llvmbot commented 4 months ago

@llvm/issue-subscribers-clang-frontend

Author: None (Halalaluyafail3)

The following code demonstrates the issue: ```c struct{ const int c; }foo(); puts(_Generic(foo().c,int:"A",const int:"B")); ``` `foo` returns a structure that contains a qualified member named `c` of type `const int`. Calling `foo` results in an rvalue structure, which when accessing one of the members via `.` results in an rvalue. Clang does not remove qualifiers or _Atomic (C23 removes _Atomic from rvalues as well, and GCC does this in earlier versions as well) here while GCC does and the result of DR 423 seems intended to remove cases of qualified rvalues. Additionally, Clang actually generates a warning here with the default warnings stating that the `const int` branch will never be taken (and then it takes it anyway). The wording doesn't state that the result is the non-atomic (C23) and unqualified member when getting the member though I assume that is a defect.
shafik commented 4 months ago

CC @AaronBallman

AaronBallman commented 4 months ago

The return statement in foo() is what determines the value that is returned, and that is an rvalue (C23 6.8.7.5p3) of the unnamed structure type, which is unqualified. So if we're doing anything wrong, I think it has more to do with the . operator. C23 6.5.3.4p3 says:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,93) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.

So the member named here is not an lvalue because the returned structure object is not an lvalue. However, "if the first expression has qualified type" can be read to apply here and Clang's behavior is correct. I took a look at what other C compilers do and the results are wonderful:

Clang, Chibicc, EDG: do not strip the qualifier GCC, MSVC, TCC, CompCert: strips the qualifier SDCC: claims there are multiple matching associations

So I'm going to ask on the WG14 reflectors to see what folks think.

AaronBallman commented 4 months ago

I lied, I didn't need to go ask WG14. _Generic's controlling operand behaves as though lvalue conversion happens and that needs to drop the qualifiers anyway.

AaronBallman commented 4 months ago

I lied again and have asked on WG14 because I think this is a standards wording defect or I'm reading the standard wrong. lvalue conversion only applies to lvalues, so if we do get an rvalue from the member access expression, then the lvalue conversion wouldn't drop qualifiers because no conversion occurs.

https://godbolt.org/z/W64hxa4jj shows the compiler divergence for this.

Halalaluyafail3 commented 4 months ago

Interestingly, it seems that all compilers agree that with an array member qualifiers should not be removed in this situation (after decay it is no longer a qualified rvalue, but it being a qualified rvalue can still be observed with typeof). Also, I have discovered that GCC does not remove qualifiers from the result of . with an rvalue left operand in situations where lvalue conversions do not apply. That is, using the example typeof(foo().c) is the type const int but typeof(0,foo().c) is the type int. I'm thinking about making a bug report to GCC for that since it seems strange for an rvalue to undergo lvalue conversions.

AaronBallman commented 4 months ago

Committee discussion on the WG14 reflectors suggests a few things: 1) qualified rvalues are a thing in C and that was a benign thing largely because qualifiers were never observable until _Generic, but now means we need more clarity in the standard; 2) _Generic should still strip those qualifiers even if it gets an rvalue. (So the typeof behavior observing the qualifiers on the rvalue is correct.)

Halalaluyafail3 commented 2 months ago

If typeof can be used to observe the qualifiers and qualified rvalues exist, then that creates situations like the following:

struct{
    const int c;
}foo();
typeof(+foo().c)x;
x=1;

Here I see three possible results:

  1. This program violates a constraint, because the operand of the + operator is not an arithmetic type (it is a qualified arithmetic type). Strictly reading the standard, I think this would be correct interpretation.
  2. This program violates a constraint, because x=1 is assigning to a const object. This is the interpretation that Clang uses.
  3. This program is fine, and the type of x here is int. This is the interpretation that GCC uses.
AaronBallman commented 2 months ago

I don't think that program violates a constraint. C23 6.2.5p31:

... The qualified or unqualified versions of a type are distinct types that belong to the same type category and have the same representation and alignment requirements. ...

I think Clang's handling of that is correct because the function returns an rvalue, the . operator yields an lvalue of type const int, and the unary + operator only does integer promotions (which don't include lvalue conversion but could change the type) so the result is still has type const int, and that is the operand to typeof, which does not undergo lvalue conversion.

A slightly different example would be:

struct{
    const short c;
}foo();
typeof(+foo().c)x;
x=1;

where I believe you would generally get int and not const short because of integer promotions:

If the original type is not a bit-precise integer type (6.2.5): if an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int;50) otherwise, it is converted to an unsigned int.

And... Clang seems to be doing exactly that: https://godbolt.org/z/ndsjo9E7Y

Halalaluyafail3 commented 2 months ago

I don't think that program violates a constraint. C23 6.2.5p31:

... The qualified or unqualified versions of a type are distinct types that belong to the same type category and have the same representation and alignment requirements. ...

The term 'type category' is defined as such:

A type is characterized by its type category, which is either the outermost derivation of a derived type (as noted previously in this subclause in the construction of derived types), or the type itself if the type consists of no derived types.

Section 6.2.5 "Types" Paragraph 30 N3220

From how I read this, the type category of const int is int. I don't see how that makes const int included in arithmetic types. Additionally, the standard uses the the term 'qualified or unqualified character type' (in the definition of va_arg) which wouldn't make sense if this was the case as it would be redundant.

I think Clang's handling of that is correct because the function returns an rvalue, the . operator yields an lvalue of type const int, and the unary + operator only does integer promotions (which don't include lvalue conversion but could change the type) so the result is still has type const int, and that is the operand to typeof, which does not undergo lvalue conversion.

I assume you mean that the . operator yields an rvalue here, if it didn't then you could take the address of it which Clang doesn't allow. Following your line of thinking that const int is included in arithmetic types, and that integer promotions don't change the type so that it remains a qualified rvalue; shouldn't the following also be invalid?

struct{
    const int c;
}foo();
typeof(foo().c+foo().c)x;
x=1;

The types of the operands to the + (binary) operator here are both const int and the usual arithmetic conversions are applied.

Otherwise, if any of the two types is an enumeration, it is converted to its underlying type. Then, the integer promotions are performed on both operands. Next, the following rules are applied to the promoted operands:

If both operands have the same type, then no further conversion is needed.

Section 6.3.1.8 "Usual arithmetic conversions" Paragraph 1 N3220

Using this, the result of the addition should be const int yet Clang results in int anyways. Also, the result of foo().c+0 would be unclear.

AaronBallman commented 2 months ago

From how I read this, the type category of const int is int. I don't see how that makes const int included in arithmetic types. Additionally, the standard uses the the term 'qualified or unqualified character type' (in the definition of va_arg) which wouldn't make sense if this was the case as it would be redundant.

Can you imagine an interpretation where int is an integer type and const int is not an integer type? However, I think we're off in the weeds -- the standard is very unclear on these points, and that's the whole point to this issue.

I assume you mean that the . operator yields an rvalue here.

Yes, sorry for the confusion!

Using this, the result of the addition should be const int yet Clang results in int anyways. Also, the result of foo().c+0 would be unclear.

I get the same interpretation as you do.

IMO, this topic requires a WG14 paper to tease out what we want the standard to say. The author of the typeof proposal was clearly looking for typeof(foo().c) to be const int according to discussions both on and off list, and that matches the intent of the paper. The point to typeof is "I want the declared type of the thing being inspected" while typeof_unqual is "I want the non-atomic, unqualified type of the thing being inspected", but there's obviously a lot of edge cases involved. Another example would be C's adjustments to function types: https://godbolt.org/z/dzeWMz563 (which Clang doesn't correctly handle currently: https://github.com/llvm/llvm-project/issues/39494)

Halalaluyafail3 commented 2 months ago

IMO, this topic requires a WG14 paper to tease out what we want the standard to say. The author of the typeof proposal was clearly looking for typeof(foo().c) to be const int according to discussions both on and off list, and that matches the intent of the paper. The point to typeof is "I want the declared type of the thing being inspected" while typeof_unqual is "I want the non-atomic, unqualified type of the thing being inspected", but there's obviously a lot of edge cases involved. Another example would be C's adjustments to function types: https://godbolt.org/z/dzeWMz563 (which Clang doesn't correctly handle currently: #39494)

I think a good solution would be to convert to the type after lvalue conversions, if the expression was used in a context where lvalue conversions would apply had the expression been an lvalue instead. Similar to your proposed solution for this with _Generic but more generally. This would still allow typeof(foo().c) to be const int, applying . again to propagate qualifiers, and sizeof to properly determine the size of atomic members while not requiring consideration of how qualifiers pass through operators which I don't think was intended. Consider the case of atomics:

struct{
    _Atomic int c;
}foo(),bar;
+bar.c;//OK, lvalue conversions remove _Atomic
+foo().c;//Clang rejects

Clang rejects this because + requires an arithmetic type and _Atomic int is not considered an arithmetic type. Having this be valid on an lvalue but not on a non-lvalue expression doesn't really make much sense (this situation also happens with casts and function calls which result in atomic types as well).