Open llvmbot opened 9 years ago
More info from a discussion that Richard and I had last night.
The requirements on std::copy are result shall not be in the range [first, __last).
So you can't write: mycopy(p, p+10, p+5); but you can write: mycopy(p+1, p+10, p);
So memmove is the correct call here.
The validity check I used in my implementation based on DependenceAnalysis was that if src and dst may point to the same buffer, src must always be ahead of dst.
That's not required for memmove - it will DTRT with overlapping source and dest. (either copy from the front or the back, depending)
Right. But you can't transform a loop that's always copying from the front into a memmove that copies from the back, the results would be different.
The canonical counterexample is
for (int i = 0; i < n; ++i) x[i] = y[i];
which is not the same as memmove(x, y, n) if the buffers alias and x > y
The validity check I used in my implementation based on DependenceAnalysis was that if src and dst may point to the same buffer, src must always be ahead of dst.
That's not required for memmove - it will DTRT with overlapping source and dest. (either copy from the front or the back, depending)
As now two people have found out this is actually not as easy as it seems. Turning arbitrary copy loops into memmove is not safe. The validity check I used in my implementation based on DependenceAnalysis was that if src and dst may point to the same buffer, src must always be ahead of dst.
Attempt at a fix: https://reviews.llvm.org/D44477
nm. Abandoned
Attempt at a fix: https://reviews.llvm.org/D44477
I don't remember details, but there were some fundamental misunderstandings of LLVM IR semantics that made it into DependenceAnalysis. It's probably a better idea to find another way and do the analysis whether the transformation is safe for a given loop without DependenceAnalysis.
@Benjaman, would you be able to provide an example of the quality problems you're eluding to? Or how I should go about testing for them?
Basing this on DependenceAnalysis was probably a mistake, I don't think the quality issues it has were resolved since I tried to use it for memmove formation in 2012 sigh
Back then Chandler had an idea on how to do this with ScevAA. But I don't remember the details.
I'll take a look at this. I've been wanting to hack on LLVM for a while.
I just removed a chunk of code in libc++ that (in std::fill_n
recognized when the iterators were plain pointers, and the destination type was a one-byte trivially assignable type, and called memset
.
I removed it because clang is now good enough at optimizing that it recognized the pattern and generated the call to memset itself - so the complexity in libc++ was no longer needed.
I'd love to do that for std::copy
, std::move
, and std::copy_backwards
, which have similar specializations to call memmove
, but clang doesn't do that on its' own (yet).
And here is that naive copy implementation, suitable for pasting into compiler explorer:
#include <iterator>
template <class _InputIterator, class _OutputIterator>
inline
_OutputIterator
mycopy(_InputIterator __first, _InputIterator __last, _OutputIterator __result)
{
for (; __first != __last; ++__first, (void) ++__result)
*__result = *__first;
return __result;
}
void copy2(int *pf, int *pl, int *d)
{
mycopy(pf, pl, d);
}
LLVM still doesn't implement memmove formation.
Here is essentially the most beautiful IR possible, handily produced by Clang for libc++'s naive std::copy implementation:
target triple = "x86_64-unknown-linux-gnu"
define void @​bar(i32* %begin, i32* %end, i32* nocapture %dest) {
bb:
%cmp1 = icmp eq i32* %begin, %end
br i1 %cmp1, label %exit, label %loop.ph
loop.ph:
br label %loop
loop:
%dest.i = phi i32* [ %dest.next, %loop ], [ %dest, %loop.ph ]
%begin.i = phi i32* [ %begin.next, %loop ], [ %begin, %loop.ph ]
%data = load i32, i32* %begin.i, align 4
store i32 %data, i32* %dest.i, align 4
%begin.next = getelementptr inbounds i32, i32* %begin.i, i64 1
%dest.next = getelementptr inbounds i32, i32* %dest.i, i64 1
%cmp2 = icmp eq i32* %begin.next, %end
br i1 %cmp2, label %loop.exit, label %loop
loop.exit:
br label %exit
exit:
ret void
}
Extended Description
GCC is able to generate a memmov from the following loop, while LLVM is unable to produce either a memset or memmov.
void test(int *a) { for (int i = 0; i < 1023; ++i) a[i] = a[i+1]; }
Benjamin added an implementation in r166875, but it was reverted. The pass was then totally revamped and a new implementation may be required.