llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.98k stars 11.05k forks source link

Missed optimization with "static const" local variable initialized with function #19664

Open llvmbot opened 10 years ago

llvmbot commented 10 years ago
Bugzilla Link 19290
Version trunk
OS All
Attachments Complete bitcode output by clang for first program fragment
Reporter LLVM Bugzilla Contributor
CC @hfinkel,@zygoloid,@rnk

Extended Description

Take the following program fragment:

static int foo() { return 23; }

int bar() { static const int FOO = foo();

return FOO; }

When compiled with clang -r205174, -O3 and without thread-safe statics:

% ~/LLVM/build/Release+Asserts/bin/clang++ -S -emit-llvm -O3 -fno-threadsafe-statics clang.cpp

It is compiled to this (full output attached):

define i32 @​_Z3barv() #​0 { entry: %.b = load i1* @​_ZGVZ3barvE3FOO, align 1 br i1 %.b, label %init.end, label %init.check

init.check: ; preds = %entry store i32 23, i32 @​_ZZ3barvE3FOO, align 4, !tbaa !​1 %0 = tail call {} @​llvm.invariant.start(i64 4, i8 bitcast (i32 @​_ZZ3barvE3FOO to i8)) store i1 true, i1 @​_ZGVZ3barvE3FOO, align 1 br label %init.end

init.end: ; preds = %entry, %init.check %1 = load i32* @​_ZZ3barvE3FOO, align 4, !tbaa !​1 ret i32 %1 }

As can be seen, despite the fact that "FOO" is initialized with a constant value of 23, this initialization does not happen at compile-time or at least at run-time during program startup. Instead there is a flag ("_ZGVZ3barvE3FOO") that is checked each time is called to check if "FOO" has already been initialized.

(With thread-safe statics there is additional code to make sure the flag is checked and set in a thread-safe manner.)

When marking "foo" as "constexpr", the generated code is as expected:

define i32 @​_Z3barv() #​0 { entry: ret i32 23 }

However changing the function to "constexpr" is not always a possible solution, e.g. when using external headers such as xmmintrin.h:

include

float baz() { static const __m128 one = _mm_set_ss(1.0f);

return _mm_cvtss_f32(one); }

Results in:

define float @​_Z3bazv() #​0 { entry: %.b = load i1* @​_ZGVZ3bazvE3one, align 1 br i1 %.b, label %init.end, label %init.check

init.check: ; preds = %entry store <4 x float> <float 1.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, <4 x float> @​_ZZ3bazvE3one, align 16, !tbaa !​1 %0 = tail call {} @​llvm.invariant.start(i64 16, i8 bitcast (<4 x float> @​_ZZ3bazvE3one to i8)) store i1 true, i1 @​_ZGVZ3bazvE3one, align 1 br label %init.end

init.end: ; preds = %entry, %init.check %1 = load <4 x float>* @​_ZZ3bazvE3one, align 16, !tbaa !​1 %vecext.i = extractelement <4 x float> %1, i32 0 ret float %vecext.i }

However, changing the code to the equivalent (essentially inlining _mm_set_ss):

float baz() { static const __m128 one = { 1.0f, 0.0f, 0.0f, 0.0f };

return _mm_cvtss_f32(one); }

Results in the expected:

define float @​_Z3bazv() #​0 { entry: ret float 1.000000e+00 }

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 10 years ago

globalopt is good at this sort of thing, but I think the dependence on the guard variable is too complex for it at the moment.