llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29k stars 11.95k forks source link

Wrong optimizations and missed optimizations involving pointers to subobjects #37456

Open 86d6c535-ce59-44e5-8435-81946426fe8e opened 6 years ago

86d6c535-ce59-44e5-8435-81946426fe8e commented 6 years ago
Bugzilla Link 38108
Version 6.0
OS Linux
CC @nunoplopes,@zygoloid

Extended Description

Please consider the functions below and the code generated by Clang 6.0.0 and trunk according to Matt Godbolt's Compiler Explorer (https://godbolt.org/g/Z4auAE ):

// C code
#include <string.h>
#include <stddef.h>

struct s {
    int a;
    char t[8];
    int b;
} s;

int r1, r2, r3, r4;
void f(char *);

void std_fun(int x, int z) {
  char *p = s.t;
  r1 = s.a;
  r2 = s.b;
  memset(p+x, 0, z);
  r3 = s.a;
  r4 = s.b;
}

void std_fun_twist(int x, int z) {
  char *p = (char*)&s + offsetof(struct s, t);
  r1 = s.a;
  r2 = s.b;
  memset(p+x, 0, z);
  r3 = s.a;
  r4 = s.b;
}

void custom_fun(void) {
  struct s s;
  char *p = s.t;
  r1 = s.a;
  r2 = s.b;
  f(p);
  r3 = s.a;
  r4 = s.b;
}

void signed_ptr_offset(int x) {
  char *p = s.t;
  r1 = s.a;
  r2 = s.b;
  p[x]=1;
  r3 = s.a;
  r4 = s.b;
}

void signed_ptr_offset_twist(int x) {
  char *p = (char*)&s + offsetof(struct s, t);
  r1 = s.a;
  r2 = s.b;
  p[x]=1;
  r3 = s.a;
  r4 = s.b;
}

void array(int x) {
  r1 = s.a;
  r2 = s.b;
  s.t[x]=1;
  r3 = s.a;
  r4 = s.b;
}

Generated Assembly (Clang 6.0.0):

std_fun:                                # @std_fun
        pushq   %rbx
        movl    s(%rip), %ebx
        movl    %ebx, r1(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r2(%rip)
        movslq  %edi, %rax
        leaq    s+4(%rax), %rdi
        movslq  %esi, %rdx
        xorl    %esi, %esi
        callq   memset
        movl    %ebx, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        popq    %rbx
        retq
std_fun_twist:                          # @std_fun_twist
        pushq   %rbx
        movl    s(%rip), %ebx
        movl    %ebx, r1(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r2(%rip)
        movslq  %edi, %rax
        leaq    s+4(%rax), %rdi
        movslq  %esi, %rdx
        xorl    %esi, %esi
        callq   memset
        movl    %ebx, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        popq    %rbx
        retq
custom_fun:                             # @custom_fun
        subq    $24, %rsp
        leaq    12(%rsp), %rdi
        callq   f
        movl    8(%rsp), %eax
        movl    20(%rsp), %ecx
        movl    %eax, r3(%rip)
        movl    %ecx, r4(%rip)
        addq    $24, %rsp
        retq
signed_ptr_offset:                      # @signed_ptr_offset
        movl    s(%rip), %eax
        movl    %eax, r1(%rip)
        movl    s+12(%rip), %ecx
        movslq  %edi, %rdx
        movb    $1, s+4(%rdx)
        movl    %ecx, r2(%rip)
        movl    %eax, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        retq
signed_ptr_offset_twist:                # @signed_ptr_offset_twist
        movl    s(%rip), %eax
        movl    %eax, r1(%rip)
        movl    s+12(%rip), %ecx
        movslq  %edi, %rdx
        movb    $1, s+4(%rdx)
        movl    %ecx, r2(%rip)
        movl    %eax, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        retq
array:                                  # @array
        movl    s(%rip), %eax
        movl    %eax, r1(%rip)
        movl    s+12(%rip), %ecx
        movslq  %edi, %rdx
        movb    $1, s+4(%rdx)
        movl    %ecx, r2(%rip)
        movl    %eax, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        retq

The point of interest for each function is whether Clang generates code to reload the value of the struct members a and b after the struct has been accessed, the access being made from a pointer initially pointing to member t. The values are to be stored in variables r3 and r4. A sequence such as the following indicates that Clang considers that the value of the member a must be the same after the call to memset as it was before:

        callq   memset
        movl    %ebx, r3(%rip)

Note that the variable p is always a pointer to char, and s is always a variable, which means that the functions cannot be said to be incorrect because of “strict aliasing” violations. Incidentally, adding -fno-strict-aliasing to the commandline does not change the generated code.

1/ WRONG OPTIMIZATION

In the functions std_fun_twist and signed_ptr_offset_twist, r3 is assigned without reloading the member a. I do not think that there exist any possible justification for this optimization in a real compiler used for, for instance, OS code. Some programmers will use char pointers to access the representation of structs. The struct's representation is an array of bytes (https://port70.net/~nsz/c/c11/n1570.html#6.2.6.1p2 ) and these functions are only doing valid pointer arithmetics within this array.

2/ MISSED OPTIMIZATION

In the function named array, Clang reloads the member b to assign to variable r4. I think it would be valid to treat any argument x to that function not between 0 and 7 as causing undefined behavior, and therefore to reuse the value loaded in %ecx. This is an optimization that GCC already does.

3/ BORDERLINE CASES

While it is difficult to infer intention from absence of optimization, I think that GCC avoids optimizing the other functions on purpose, making a distinction between direct array subscripting and use of an intermediate pointer that can be interpreter as a pointer inside the entire struct. While the C standards do not distinguish between the two, I think this is a valuable heuristic and I hope that Clang will align to GCC's behavior in these cases. Regardless of the decision, each example is either a wrong optimization or a missed optimization, since Clang always reload the member b and assumes that the member a was not modified.

86d6c535-ce59-44e5-8435-81946426fe8e commented 6 years ago

For reference, Michele Alberti has just pointed me to this discussion in GCC's bugzilla, which contains the same kind of discussion as here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86259

To attempt to summarize that discussion:

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

OK, I believe there are a couple of remaining pieces here, then:

1) Make sure that the fix for the miscompiles has been backported to the Clang 6 branch; this seems like an important miscompile to fix.

2) Extend Clang's field-sensitive TBAA metadata to also encode information about arrays, so that we can optimize 'array'. (As discussed in "borderline cases", this would not affect any of the other functions in comment#0.)

86d6c535-ce59-44e5-8435-81946426fe8e commented 6 years ago

Yes, I am sorry, I accidentally put irrelevant undefined behavior in the example custom_fun. I mean for the automatic variable s to be initialized. I made it an automatic variable in that example because otherwise the compiler had to assume that this variable might be known to, and modified directly by, the called function f, so the example would not test what I intended to test.

sanjoy commented 6 years ago

Wait, what about custom_fun? The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong, since we don't know anything about f.

That's probably because they're storing undef. I see the stores if I add initializers for a and b.

The confusing part here is that there is both a global and a local (in custom_fun) named s.

sanjoy commented 6 years ago

Wait, what about custom_fun? The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong, since we don't know anything about f.

That's probably because they're storing undef. I see the stores if I add initializers for a and b.

nunoplopes commented 6 years ago

Wait, what about custom_fun? The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong, since we don't know anything about f.

86d6c535-ce59-44e5-8435-81946426fe8e commented 6 years ago

I agree that the “wrong optimization” part of the report is already fixed in the version that Compiler Explorer calls “trunk” today (“clang version 7.0.0 (trunk 336621)” for long).

I would also like to thank you for your insights on the borderline cases.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

Actually... looks like this has already been fixed; with "Clang (trunk)" on godbolt, all 6 functions reload s.a and s.b.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

Sanjoy says that inbounds is not supposed to imply that the pointer value cannot decrease, so this looks like a middle-end optimization bug.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

1/ WRONG OPTIMIZATION

In the function[...] std_fun_twist [...], r3 is assigned without reloading the member a.

I believe LLVM's behavior on that function is correct. memset from offset 4 onwards cannot affect the bytes at offset 0-3, so no reload of s.a is necessary.

Sorry, I missed the +x here. The problem appears to be the same as in the other function (incorrect use of getelementptr inbounds).

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

1/ WRONG OPTIMIZATION

In the function[...] std_fun_twist [...], r3 is assigned without reloading the member a.

I believe LLVM's behavior on that function is correct. memset from offset 4 onwards cannot affect the bytes at offset 0-3, so no reload of s.a is necessary.

In the function[...] signed_ptr_offset_twist, r3 is assigned without reloading the member a.

This looks like a bug. I believe the problem is that Clang emits the array indexing as a getelementptr inbounds (indicating that the unsigned arithmetic does not wrap around the address space), which is wrong in the case where the integer operand is signed and the other operand is not the result of immediate array-to-pointer decay.

2/ MISSED OPTIMIZATION

In the function named array, Clang reloads the member b

Yes, this is a missed opportunity. Clang already performs this optimization in some cases, but does not take array accesses into account when forming the struct-path-based TBAA set for the access to p[x].

3/ BORDERLINE CASES

While it is difficult to infer intention from absence of optimization, I think that GCC avoids optimizing the other functions on purpose

I wouldn't assume that; it seems more likely to me to be due to implementation difficulty. That said, Clang's unlikely to optimize signed_ptr_offset any time soon, because our determination of the aliasing set for an expression is based on a local syntactic analysis performed by the frontend; we would need to reflect significant chunks of the frontend semantics into the optimizer in order to be able to infer that p is guaranteed to point to a 'member t in struct s' object. GCC may well not be optimizing it for largely the same reason.

86d6c535-ce59-44e5-8435-81946426fe8e commented 6 years ago

assigned to @sanjoy