llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.78k stars 11.9k forks source link

Incorrect assembly code generated with -Os #114185

Open mushenoy opened 1 day ago

mushenoy commented 1 day ago

For the below sample source file - the assembly code generated is incorrect.

Sample.c:

struct sample {
    struct sample *next;
    char a;
};

char function (char);
void test(void);

void* test_function (const struct sample *c)
{
    char type;
    if (c) {
        type = c->a;
        test();
   }
   function(type);
   return (void *)0;
} 

Compilation command:

clang --target=aarch64-linux-gnu -save-temps -c sample.c -o sample.o -Os

In the assembly:

Assembly code:

test_function:
        stp     x29, x30, [sp, #-32]!
        str     x19, [sp, #16]
        mov     x29, sp
        ldrb    w19, [x0, #8]
        bl      test
        mov     w0, w19
        bl      function
        mov     x0, xzr
        ldr     x19, [sp, #16]
        ldp     x29, x30, [sp], #32
        ret

x86_64:

test_function:
        push    rbx
        movsx   ebx, byte ptr [rdi + 8]
        call    test@PLT
        mov     edi, ebx
        call    function@PLT
        xor     eax, eax
        pop     rbx
        ret

Observations:

keinflue commented 1 day ago

The missing branch has undefined behavior, because type is read in it while being uninitialized. That is UB for automatic storage duration objects which do not have their address taken.

However, this modification does not have undefined behavior and still results in the same assembly:

struct sample {
    struct sample *next;
    char a;
};

char function (char);
void test(void);

void* test_function (const struct sample *c)
{
    char type;
    char* _ = &type;
    if (c) {
        type = c->a;
        test();
   }
   function(type);
   return (void *)0;
}

In C (but not C++), taking the address of the automatic storage duration variable should avoid the UB of using the indeterminate representation, at least for character type variables.

In C++ copying an unsigned char (but not char) with an indeterminate value has defined behavior, but Clang also incorrectly drops the branch in the following variation of the program when compiled as C++ with -Os:

struct sample {
    struct sample *next;
    char a;
};

char function (unsigned char);
void test(void);

void* test_function (const struct sample *c)
{
    unsigned char type;
    if (c) {
        type = c->a;
        test();
   }
   function(type);
   return (void *)0;
}
keinflue commented 1 day ago

@EugeneZelenko While the original code does have UB, the modifications in my comment don't and Clang compiles them incorrectly. Also I don't know to what degree indeterminate value/representation UB is considered undefined per Clang's policies. I am a bit surprised that it compiles away the branch completely, rather than just considering the indeterminate representation unstable, as well.

keinflue commented 1 day ago

@EugeneZelenko I think this is a frontend issue.

For example, for the C++ variation that has defined behavior from my previous comment the following is produced for the call to function:

%call = call noundef signext i8 @function(unsigned char)(i8 noundef zeroext %3)

noundef on the function argument is generally incorrect. Or alternatively the variable's initial state musn't be undefined.

dtcxzyw commented 19 hours ago

The key point is that clang sets noundef on the function argument of function. Then SimplifyCFGPass will eliminate the null check: https://github.com/llvm/llvm-project/blob/287781c7c9dbd7674cf7cbab8a8fe8a49a4b9317/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L8045-L8080 https://github.com/llvm/llvm-project/blob/287781c7c9dbd7674cf7cbab8a8fe8a49a4b9317/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L8009-L8037

With -Xclang -no-enable-noundef-analysis, the null pointer check will be kept: https://godbolt.org/z/3Eb4rhens

llvmbot commented 12 hours ago

@llvm/issue-subscribers-clang-codegen

Author: Mukund Shenoy (mushenoy)

For the below sample source file - the assembly code generated is incorrect. Sample.c: ``` struct sample { struct sample *next; char a; }; char function (char); void test(void); void* test_function (const struct sample *c) { char type; if (c) { type = c->a; test(); } function(type); return (void *)0; } ``` Compilation command: ``` clang --target=aarch64-linux-gnu -save-temps -c sample.c -o sample.o -Os ``` In the assembly: - No conditional branch is generated to check if `c` is non-null. - test function call is done unconditionally Assembly code: ``` test_function: stp x29, x30, [sp, #-32]! str x19, [sp, #16] mov x29, sp ldrb w19, [x0, #8] bl test mov w0, w19 bl function mov x0, xzr ldr x19, [sp, #16] ldp x29, x30, [sp], #32 ret ``` x86_64: ``` test_function: push rbx movsx ebx, byte ptr [rdi + 8] call test@PLT mov edi, ebx call function@PLT xor eax, eax pop rbx ret ``` Observations: - The issue is observed when -Os is used - Code gets generated correctly when `type` variable is initialized with some value.