Open ahatanak opened 8 years ago
Assigning to clang.
In this example clang lowers fill_n to a memcpy loop wrapped in an always_inline fill function. The LoopIdiomPass recognizes memcpy/memset pattern, but does has no concept of loop hierarchy and does not try to form a memcpy loop out of a loop with memcpy calls. It seems more natural to lower fill_n to a memcpy (or memset) call directly.
I don't have a real-world use case that shows using memset_pattern significantly improves performance (this bug was reported by a user). I filed this PR for libc++ because it seemed like this optimization could be done using template specialization, but there is no reason it can't be done in llvm. Doing it in llvm enables optimizing manually-written loops too (as shown in llvm/llvm-bugzilla-archive#27209 ), so we should probably try to implement this optimization in llvm first and see if it's still necessary to add a template specialization to libc++ after that.
This should be done in LoopIdiomRecognize in LLVM. It can already form memset_pattern with a constant pattern, extending that to work for variable patterns would be a nice addition.
Why should this be done by libc++ rather than by LLVM?
memset_pattern is (as far as I can tell from 3 minutes of searching) an Apple-only thing. libc++ is used in many non-Apple environments.
I'd love to see an example of this showing the performance advantage - and a real-world case where this is important.
Extended Description
libc++ should use memset_pattern when the input is right (== 4/8/16, type is pod, target iterator is a pointer) instead of using a loop.
For example, clang/llvm currently emits a loop for fill_n in the following example:
$ cat test.cpp struct S { int i; };
S v[256];
void foo(const S &s) { fill_n(v,256,s); // use memset_pattern4(v,&s,256*4) here }