Open 86d6c535-ce59-44e5-8435-81946426fe8e opened 6 years ago
For reference, Michele Alberti has just pointed me to this discussion in GCC's bugzilla, which contains the same kind of discussion as here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86259
To attempt to summarize that discussion:
there is consensus that the standard says that one cannot use a pointer to a subobject to go back to the containing object;
the idea of fully exploiting this idea is both on the table and making at least some GCC developers uncomfortable;
GCC already contains exceptions where this idea is deliberately not exploited in order to keep legacy code working (memcpy is mentioned);
it is already possible to trigger optimizations that would surprise a programmer who believes that one can go back form subobject to containing object: one example is the subject of the bug report.
OK, I believe there are a couple of remaining pieces here, then:
1) Make sure that the fix for the miscompiles has been backported to the Clang 6 branch; this seems like an important miscompile to fix.
2) Extend Clang's field-sensitive TBAA metadata to also encode information about arrays, so that we can optimize 'array'. (As discussed in "borderline cases", this would not affect any of the other functions in comment#0.)
Yes, I am sorry, I accidentally put irrelevant undefined behavior in the example custom_fun
. I mean for the automatic variable s to be initialized. I made it an automatic variable in that example because otherwise the compiler had to assume that this variable might be known to, and modified directly by, the called function f, so the example would not test what I intended to test.
Wait, what about
custom_fun
? The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong, since we don't know anything aboutf
.That's probably because they're storing undef. I see the stores if I add initializers for a and b.
The confusing part here is that there is both a global and a local (in custom_fun
) named s
.
Wait, what about
custom_fun
? The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong, since we don't know anything aboutf
.
That's probably because they're storing undef
. I see the stores if I add initializers for a
and b
.
Wait, what about custom_fun
?
The stores to r1
, r2
get removed with -O2
, but not with -O1
. This is wrong, since we don't know anything about f
.
I agree that the “wrong optimization” part of the report is already fixed in the version that Compiler Explorer calls “trunk” today (“clang version 7.0.0 (trunk 336621)” for long).
I would also like to thank you for your insights on the borderline cases.
Actually... looks like this has already been fixed; with "Clang (trunk)" on godbolt, all 6 functions reload s.a
and s.b
.
Sanjoy says that inbounds
is not supposed to imply that the pointer value cannot decrease, so this looks like a middle-end optimization bug.
1/ WRONG OPTIMIZATION
In the function[...] std_fun_twist [...], r3 is assigned without reloading the member a.
I believe LLVM's behavior on that function is correct. memset from offset 4 onwards cannot affect the bytes at offset 0-3, so no reload of s.a is necessary.
Sorry, I missed the +x
here. The problem appears to be the same as in the other function (incorrect use of getelementptr inbounds
).
1/ WRONG OPTIMIZATION
In the function[...] std_fun_twist [...], r3 is assigned without reloading the member a.
I believe LLVM's behavior on that function is correct. memset from offset 4 onwards cannot affect the bytes at offset 0-3, so no reload of s.a
is necessary.
In the function[...] signed_ptr_offset_twist, r3 is assigned without reloading the member a.
This looks like a bug. I believe the problem is that Clang emits the array indexing as a getelementptr inbounds
(indicating that the unsigned arithmetic does not wrap around the address space), which is wrong in the case where the integer operand is signed and the other operand is not the result of immediate array-to-pointer decay.
2/ MISSED OPTIMIZATION
In the function named array, Clang reloads the member b
Yes, this is a missed opportunity. Clang already performs this optimization in some cases, but does not take array accesses into account when forming the struct-path-based TBAA set for the access to p[x].
3/ BORDERLINE CASES
While it is difficult to infer intention from absence of optimization, I think that GCC avoids optimizing the other functions on purpose
I wouldn't assume that; it seems more likely to me to be due to implementation difficulty. That said, Clang's unlikely to optimize signed_ptr_offset
any time soon, because our determination of the aliasing set for an expression is based on a local syntactic analysis performed by the frontend; we would need to reflect significant chunks of the frontend semantics into the optimizer in order to be able to infer that p
is guaranteed to point to a 'member t in struct s' object. GCC may well not be optimizing it for largely the same reason.
assigned to @sanjoy
Extended Description
Please consider the functions below and the code generated by Clang 6.0.0 and trunk according to Matt Godbolt's Compiler Explorer (https://godbolt.org/g/Z4auAE ):
Generated Assembly (Clang 6.0.0):
The point of interest for each function is whether Clang generates code to reload the value of the struct members
a
andb
after the struct has been accessed, the access being made from a pointer initially pointing to membert
. The values are to be stored in variablesr3
andr4
. A sequence such as the following indicates that Clang considers that the value of the member a must be the same after the call to memset as it was before:Note that the variable
p
is always a pointer tochar
, ands
is always a variable, which means that the functions cannot be said to be incorrect because of “strict aliasing” violations. Incidentally, adding-fno-strict-aliasing
to the commandline does not change the generated code.1/ WRONG OPTIMIZATION
In the functions
std_fun_twist
andsigned_ptr_offset_twist
,r3
is assigned without reloading the membera
. I do not think that there exist any possible justification for this optimization in a real compiler used for, for instance, OS code. Some programmers will use char pointers to access the representation of structs. The struct's representation is an array of bytes (https://port70.net/~nsz/c/c11/n1570.html#6.2.6.1p2 ) and these functions are only doing valid pointer arithmetics within this array.2/ MISSED OPTIMIZATION
In the function named array, Clang reloads the member
b
to assign to variabler4
. I think it would be valid to treat any argumentx
to that function not between 0 and 7 as causing undefined behavior, and therefore to reuse the value loaded in%ecx
. This is an optimization that GCC already does.3/ BORDERLINE CASES
While it is difficult to infer intention from absence of optimization, I think that GCC avoids optimizing the other functions on purpose, making a distinction between direct array subscripting and use of an intermediate pointer that can be interpreter as a pointer inside the entire struct. While the C standards do not distinguish between the two, I think this is a valuable heuristic and I hope that Clang will align to GCC's behavior in these cases. Regardless of the decision, each example is either a wrong optimization or a missed optimization, since Clang always reload the member
b
and assumes that the membera
was not modified.