llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.21k stars 11.14k forks source link

Incorrect code generated when references are used with short-circuit operators (||, &&) or if-else statements #37011

Open llvmbot opened 6 years ago

llvmbot commented 6 years ago
Bugzilla Link 37663
Version 6.0
OS All
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@DougGregor,@zygoloid

Extended Description

The following two functions have identical semantics. Only when x is false should y be dereferenced. Incorrect code is generated for the first function.

int foo1(int x, int& y, int size)
{
    if (x || y < size)
        return x;
    return size;
}

int foo2(int x, int* y, int size)
{
    if (x || *y < size)
        return x;
    return size;
}

Code via compiler explorer (with -O2) shows that for foo1, y is always dereferenced.

_Z4foo1iRii: # @_Z4foo1iRii
  cmpl %edx, (%rsi)    # y is dereferenced
  cmovll %edi, %edx
  testl %edi, %edi
  cmovnel %edi, %edx
  movl %edx, %eax
  retq

_Z4foo2iPii: # @_Z4foo2iPii
  testl %edi, %edi
  je .LBB1_1
  movl %edi, %eax
  retq
.LBB1_1:
  xorl %eax, %eax
  cmpl %edx, (%rsi)
  cmovgel %edx, %eax
  retq
ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

In both the following examples, y must only be accessed if the mutex was successfully locked.

Yes, in this case this appears to be a miscompile, because try_lock_mutex might have deallocated the storage backing 'y'. We "know" the storage is dereferenceable on entry to the function, but that does not imply that it remains dereferenceable across calls to unknown code.

llvmbot commented 6 years ago

The issue seems to happen for && operator as well, and upon further playing around, conditionals in general.

In both the following examples, y must only be accessed if the mutex was successfully locked.

int try_lock_mutex();

int foo1(int& y, int size)
{
    if (try_lock_mutex() && y < size)
        return size;
    return 0;
}

int foo2(int x, int& y, int size)
{
    if (try_lock_mutex()) {
        if (y < size)
            return size;
    }
    return 0;
}

Generated code with -O2:

_Z4foo1Rii: # @_Z4foo1Rii
  pushq %r14
  pushq %rbx
  pushq %rax
  movl %esi, %ebx
  movq %rdi, %r14
  callq _Z14try_lock_mutexv   
  xorl %ecx, %ecx
  cmpl %ebx, (%r14)      # access y
  cmovll %ebx, %ecx
  testl %eax, %eax       # then check return value of try_lock_mutex
  cmovnel %ecx, %eax
  addq $8, %rsp
  popq %rbx
  popq %r14
  retq
_Z4foo2iRii: # @_Z4foo2iRii
  pushq %r14
  pushq %rbx
  pushq %rax
  movl %edx, %ebx
  movq %rsi, %r14
  callq _Z14try_lock_mutexv
  xorl %ecx, %ecx
  cmpl %ebx, (%r14)    # access y
  cmovll %ebx, %ecx
  testl %eax, %eax
  cmovnel %ecx, %eax   # then check return value of try_lock_mutex
  addq $8, %rsp
  popq %rbx
  popq %r14
  retq
ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

The UB would happen at the point of the reference binding (in the caller).

llvmbot commented 6 years ago

Is the behavior undefined even though y is accessed only conditionally?

Here is another example where y is valid on entry to the function but gets deleted within the function and then conditionally accessed via a pointer.

int fcn_deletes_y(int*);

int foo1(int x, int& y, int size)
{
    int* py = &y;
    if (x || fcn_deletes_y(py) || *py < size)
        return size;
    return 0;
}

_Z4foo1iRii: # @_Z4foo1iRii
  pushq %r14
  pushq %rbx
  pushq %rax
  movl %edx, %ebx
  movq %rsi, %r14
  testl %edi, %edi
  je .LBB0_1
  movl %ebx, %eax
  jmp .LBB0_3
.LBB0_1:
  movq %r14, %rdi
  callq _Z13fcn_deletes_yPi
  xorl %ecx, %ecx
  cmpl %ebx, (%r14)    # *py is accessed first
  cmovll %ebx, %ecx
  testl %eax, %eax     # before checking the return value of fcn_deletes_y
  cmovnel %ebx, %ecx
  movl %ecx, %eax
.LBB0_3:
  addq $8, %rsp
  popq %rbx
  popq %r14
  retq
ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

At the point of entry to the function, behaviour is undefined if the reference is not bound to suitable allocated storage for an int. We infer from this that it is valid to speculate a load of the int reference. The same is not true for the pointer because it might not point to 4 bytes of dereferenceable storage.

jyknight commented 6 months ago

Yes, there is a bug here: LLVM assumes the reference is valid throughout the function, not just at the point of entry. It requires IR design changes around the interaction between "dereferenceable" and freeing memory, which have been discussed, but no resolution reached: https://discourse.llvm.org/t/llvm-semantics-vs-mmap-munmap/73330/10 https://discourse.llvm.org/t/rfc-decomposing-deref-n-into-deref-n-nofree/57873