llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.68k stars 11.86k forks source link

-fno-zero-initialized-in-bss and tentative definitions #30973

Open hubert-reinterpretcast opened 7 years ago

hubert-reinterpretcast commented 7 years ago
Bugzilla Link 31625
Version trunk
OS Linux
CC @erichkeane,@zygoloid

Extended Description

Clang's support for -fno-zero-initialized-in-bss does not match GCC's. In particular, when using -fno-common, tentative definitions are still placed by GCC into BSS.

Online compiler: http://melpon.org/wandbox/permlink/76ugV7cPaxxqjIMl

Source ():

int x;

Compiler invocation:

clang -c -o a.o -x c -fno-common -fno-zero-initialized-in-bss -

Additional commands:

objdump -wt a.o | grep -P '\b''x\b'

Expected output:

0000000000000000 g O .bss 0000000000000004 x

Actual output:

0000000000000000 g O .data 0000000000000004 x

clang -v:

clang version 4.0.0 (trunk 290110) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/local/llvm-head/bin Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.3 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64

erichkeane commented 5 years ago

Alright, thanks for the confirmation.

I'll discuss with my LLVM folks how we can communicate the difference between "initialized in text" vs "initialized by rule". At the moment, I don't see a way to do so in IR.

hubert-reinterpretcast commented 5 years ago

Based on my reading of: http://eel.is/c++draft/dcl.init#10 and http://eel.is/c++draft/basic.stc.static

I think my version above is illegal, right? We presumably need some way to identify which variables are "initialized in text" vs "initialized by rule" in this case. Yes, zero-initialization produces defined values that can be inspected.

As for the semantics, the "initialized by rule" portion seems to specifically be the zero-initialization that is performed by [basic.start.static] in the absence of constant initialization. An object that needs dynamic initialization goes into BSS (if all bytes should be zero as the result of zero-initialization).

All other constant initialization (including zero-initializing as part of value-initialization) is not considered for BSS when -fno-zero-initialized-in-bss is in effect.

erichkeane commented 5 years ago

Based on my reading of: http://eel.is/c++draft/dcl.init#10 and http://eel.is/c++draft/basic.stc.static

I think my version above is illegal, right? We presumably need some way to identify which variables are "initialized in text" vs "initialized by rule" in this case.

erichkeane commented 5 years ago

This still exists in trunk today. I believe the problem in part is that IR CodeGen doesn't differentiate between an initalized and uninitialized variable. See: https://godbolt.org/z/vnhOMG

Note that: @​i_uninit = dso_local global i32 0, align 4, !dbg !​0 @​i_init_zero = dso_local global i32 0, align 4, !dbg !​6

BOTH are i32 0, despite one being initialized and one not.

CodeGenModule.cpp (CodeGenModule::EmitGlobalVarDefinition) seems to do this intentionally (see the else if(!InitExpr) condition, ~3493).

The comment claims that this is intentional. It seems to me that we could replace the Init = line in that with llvm::UndefValue::get(D->getType()->getTypePtr());, however I'm not sure of the full consequences of that.

Additionally, some LLVM work would need to be done to correctly handle the bss based on its init status.

Does anyone familiar with this code have guidance that they can give? The test failures of the above suggested changes (to get the Clang done) is a pretty massive list, but I'm OK doing them if we believe this is the right thing.

hubert-reinterpretcast commented 2 weeks ago

Noting that this applies also for internal-linkage cases (regardless of -fno-common):

$ gcc -Wall -Wextra -pedantic -c -fno-zero-initialized-in-bss -xc -<<<$'static int i;\nstatic int i;\nvoid *f() { return &i; }' && objdump -t -- -.o | grep ' i$'
0000000000000000 l     O .bss   0000000000000004 i
Return:  0x00:0   Mon Oct  7 18:42:45 2024 EDT
$ clang++ -Wall -Wextra -pedantic -c -fno-zero-initialized-in-bss -xc -<<<$'static int i;\nstatic int i;\nvoid *f(void) { return &i; }' -o -.o && objdump -t -- -.o | grep ' i$'
0000000000000000 l     O .data  0000000000000004 i
Return:  0x00:0   Mon Oct  7 18:43:37 2024 EDT