llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.9k stars 11.51k forks source link

LLVM fails to inline/clone recursive function, leading to 3x slowdown compared to gcc #20528

Open llvmbot opened 10 years ago

llvmbot commented 10 years ago
Bugzilla Link 20154
Version trunk
OS Linux
Attachments test case
Reporter LLVM Bugzilla Contributor
CC @d0k,@chandlerc,@DimitryAndric,@emaste,@hfinkel

Extended Description

The attached (big test case) when compiled with trunk clang doesnt have cpu_search() function inlined (or cloned?) into its callers even when marked always_inline.

This is because the function is (indirectly) recursive. When compiling the same file with gcc49, the cpu_search function is inlined.

This file is implementation of scheduler from FreeBSD and as reported (https://kib.kiev.ua/kib/pgsql_perf.pdf) inlining it results in massive improvement in performance (25% -> 8%).

d0k commented 10 years ago

Got it now.

We inline cpusearch{highest, lowest, both} into cpu_search, making it directly recursive and blocking any further inlining.

GCC inlines cpu_search into cpusearch{highest, lowest, both}, getting nice constant propagation.

Probably no way to fix this because of the different strategies in inlining (top-down vs. bottom-up). So we're left with cloning, which isn't feasible without profiling info, which isn't available without pass manager work.

d0k commented 10 years ago

Fun fact: We are inlining the always_inline function at -O0, but not at -O2.

d0k commented 10 years ago

Marking cpusearch{lowest,highest} as noinline makes llvm inline the cpu_search just fine. Can llvm be taught this trick?

That doesn't reproduce for me. I'm relatively sure that we never inline recursive calls. I'm still not entirely sure what magic GCC is applying here, however one neat solution would be to clone and constant propagate the function 3x. Each clone would shrink significantly because most branches go away.

llvmbot commented 10 years ago

Marking cpusearch{lowest,highest} as noinline makes llvm inline the cpu_search just fine. Can llvm be taught this trick?

d0k commented 10 years ago

GCC is directly inlining cpu_search into cpu_search_highest, there is no cloning happening here.

This seems to be a limitation of our inliner as it directly aborts when it encounters recursion. Not sure how GCC determines that inlining is safe when recursion is present.