llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.05k stars 11.98k forks source link

[clang] Taking address of unreachable function can be used to obtain identical integers that compare unequal #60596

Open SvizelPritula opened 1 year ago

SvizelPritula commented 1 year ago

Using __builtin_unreachable, it's possible to create a function a that compiles to zero Assembly instructions, like this:

void a() {
    __builtin_unreachable();
}

void b() {}

With -O1 or higher this compiles to:

a:
b:
        ret

The function a is pretty useless, since calling it will unconditionally result in undefined behaviour. It is, however, possible to take its address, like this:

#include <stdlib.h>
#include <stdio.h>

void a() {
    __builtin_unreachable();
}

void b() {}

int main() {
    size_t ap = (size_t) a;
    size_t bp = (size_t) b;

    printf("%zu %zu %d\n", ap, bp, ap == bp);
}

Executing this will reveal that ap and bp have identical values, since a and b have the same address. However, it will also show that ap == bp is false, which contradicts that.

My guess is that some optimization pass assumes that different functions have different addresses, which is required by the C standard.

This bug is unlikely to happen in real programs, since: a) few programs have functions that have unconditionally undefined behaviour, b) even fewer programs will take the address of such a function, and c) fewer still programs compare function pointers.

__builtin_unreachable can also be replaced by other statements with undefined behaviour, such as for (int i=0; i>=0; i++);.

Tested with clang and clang++ 15.0.7 with an optimization level of 1.

Endilll commented 1 year ago

Confirmed: https://godbolt.org/z/75z4zMTnd CC @AaronBallman

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-frontend

Using `__builtin_unreachable`, it's possible to create a function `a` that compiles to zero Assembly instructions, like this: ```c void a() { __builtin_unreachable(); } void b() {} ``` With `-O1` or higher this compiles to: ```asm a: b: ret ``` The function `a` is pretty useless, since calling it will unconditionally result in undefined behaviour. It is, however, possible to take its address, like this: ```c #include <stdlib.h> #include <stdio.h> void a() { __builtin_unreachable(); } void b() {} int main() { size_t ap = (size_t) a; size_t bp = (size_t) b; printf("%zu %zu %d\n", ap, bp, ap == bp); } ``` Executing this will reveal that `ap` and `bp` have identical values, since `a` and `b` have the same address. However, it will also show that `ap == bp` is false, which contradicts that. My guess is that some optimization pass assumes that different functions have different addresses, which is required by the C standard. This bug is unlikely to happen in real programs, since: a) few programs have functions that have unconditionally undefined behaviour, b) even fewer programs will take the address of such a function, and c) fewer still programs compare function pointers. `__builtin_unreachable` can also be replaced by other statements with undefined behaviour, such as `for (int i=0; i>=0; i++);`. Tested with `clang` and `clang++` 15.0.7 with an optimization level of 1.
shafik commented 1 year ago

This is known problem with unreachable: https://github.com/llvm/llvm-project/issues/48943

Also see: https://discourse.llvm.org/t/can-we-keep-must-progress-optimizations-around-infinite-loop-in-c-while-avoiding-some-surprising-behavior/69205

Right now it does not seem like anyone has the bandwidth to tackle this issue.

I feel like this is not exactly a duplicate but if the OP feels like it is a close enough then feel free to close.

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-codegen

Using `__builtin_unreachable`, it's possible to create a function `a` that compiles to zero Assembly instructions, like this: ```c void a() { __builtin_unreachable(); } void b() {} ``` With `-O1` or higher this compiles to: ```asm a: b: ret ``` The function `a` is pretty useless, since calling it will unconditionally result in undefined behaviour. It is, however, possible to take its address, like this: ```c #include <stdlib.h> #include <stdio.h> void a() { __builtin_unreachable(); } void b() {} int main() { size_t ap = (size_t) a; size_t bp = (size_t) b; printf("%zu %zu %d\n", ap, bp, ap == bp); } ``` Executing this will reveal that `ap` and `bp` have identical values, since `a` and `b` have the same address. However, it will also show that `ap == bp` is false, which contradicts that. My guess is that some optimization pass assumes that different functions have different addresses, which is required by the C standard. This bug is unlikely to happen in real programs, since: a) few programs have functions that have unconditionally undefined behaviour, b) even fewer programs will take the address of such a function, and c) fewer still programs compare function pointers. `__builtin_unreachable` can also be replaced by other statements with undefined behaviour, such as `for (int i=0; i>=0; i++);`. Tested with `clang` and `clang++` 15.0.7 with an optimization level of 1.