llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.4k stars 11.74k forks source link

Make std::fill_n use memset_pattern*() for pod types of the right size #27580

Open ahatanak opened 8 years ago

ahatanak commented 8 years ago
Bugzilla Link 27206
Version unspecified
OS All
CC @d0k,@mclow,@zygoloid,@rotateright

Extended Description

libc++ should use memset_pattern when the input is right (== 4/8/16, type is pod, target iterator is a pointer) instead of using a loop.

For example, clang/llvm currently emits a loop for fill_n in the following example:

$ cat test.cpp struct S { int i; };

S v[256];

void foo(const S &s) { fill_n(v,256,s); // use memset_pattern4(v,&s,256*4) here }

mclow commented 6 years ago

Assigning to clang.

llvmbot commented 8 years ago

In this example clang lowers fill_n to a memcpy loop wrapped in an always_inline fill function. The LoopIdiomPass recognizes memcpy/memset pattern, but does has no concept of loop hierarchy and does not try to form a memcpy loop out of a loop with memcpy calls. It seems more natural to lower fill_n to a memcpy (or memset) call directly.

ahatanak commented 8 years ago

I don't have a real-world use case that shows using memset_pattern significantly improves performance (this bug was reported by a user). I filed this PR for libc++ because it seemed like this optimization could be done using template specialization, but there is no reason it can't be done in llvm. Doing it in llvm enables optimizing manually-written loops too (as shown in llvm/llvm-bugzilla-archive#27209 ), so we should probably try to implement this optimization in llvm first and see if it's still necessary to add a template specialization to libc++ after that.

d0k commented 8 years ago

This should be done in LoopIdiomRecognize in LLVM. It can already form memset_pattern with a constant pattern, extending that to work for variable patterns would be a nice addition.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 8 years ago

Why should this be done by libc++ rather than by LLVM?

mclow commented 8 years ago

memset_pattern is (as far as I can tell from 3 minutes of searching) an Apple-only thing. libc++ is used in many non-Apple environments.

I'd love to see an example of this showing the performance advantage - and a real-world case where this is important.