llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.05k stars 11.58k forks source link

[Loop Idiom Recognizer] Missing basic memmov support #25539

Open llvmbot opened 8 years ago

llvmbot commented 8 years ago
Bugzilla Link 25165
Version trunk
OS Windows NT
Reporter LLVM Bugzilla Contributor
CC @apolukhin,@d0k,@chandlerc,@LebedevRI,@mclow

Extended Description

GCC is able to generate a memmov from the following loop, while LLVM is unable to produce either a memset or memmov.

void test(int *a) { for (int i = 0; i < 1023; ++i) a[i] = a[i+1]; }

Benjamin added an implementation in r166875, but it was reverted. The pass was then totally revamped and a new implementation may be required.

mclow commented 6 years ago

More info from a discussion that Richard and I had last night.

The requirements on std::copy are result shall not be in the range [first, __last).

So you can't write: mycopy(p, p+10, p+5); but you can write: mycopy(p+1, p+10, p);

So memmove is the correct call here.

d0k commented 6 years ago

The validity check I used in my implementation based on DependenceAnalysis was that if src and dst may point to the same buffer, src must always be ahead of dst.

That's not required for memmove - it will DTRT with overlapping source and dest. (either copy from the front or the back, depending)

Right. But you can't transform a loop that's always copying from the front into a memmove that copies from the back, the results would be different.

The canonical counterexample is

for (int i = 0; i < n; ++i) x[i] = y[i];

which is not the same as memmove(x, y, n) if the buffers alias and x > y

mclow commented 6 years ago

The validity check I used in my implementation based on DependenceAnalysis was that if src and dst may point to the same buffer, src must always be ahead of dst.

That's not required for memmove - it will DTRT with overlapping source and dest. (either copy from the front or the back, depending)

d0k commented 6 years ago

As now two people have found out this is actually not as easy as it seems. Turning arbitrary copy loops into memmove is not safe. The validity check I used in my implementation based on DependenceAnalysis was that if src and dst may point to the same buffer, src must always be ahead of dst.

llvmbot commented 6 years ago

Attempt at a fix: https://reviews.llvm.org/D44477

nm. Abandoned

llvmbot commented 6 years ago

Attempt at a fix: https://reviews.llvm.org/D44477

d0k commented 6 years ago

I don't remember details, but there were some fundamental misunderstandings of LLVM IR semantics that made it into DependenceAnalysis. It's probably a better idea to find another way and do the analysis whether the transformation is safe for a given loop without DependenceAnalysis.

llvmbot commented 6 years ago

@​Benjaman, would you be able to provide an example of the quality problems you're eluding to? Or how I should go about testing for them?

d0k commented 6 years ago

Basing this on DependenceAnalysis was probably a mistake, I don't think the quality issues it has were resolved since I tried to use it for memmove formation in 2012 sigh

Back then Chandler had an idea on how to do this with ScevAA. But I don't remember the details.

llvmbot commented 6 years ago

I'll take a look at this. I've been wanting to hack on LLVM for a while.

mclow commented 6 years ago

I just removed a chunk of code in libc++ that (in std::fill_n recognized when the iterators were plain pointers, and the destination type was a one-byte trivially assignable type, and called memset.

I removed it because clang is now good enough at optimizing that it recognized the pattern and generated the call to memset itself - so the complexity in libc++ was no longer needed.

I'd love to do that for std::copy, std::move, and std::copy_backwards, which have similar specializations to call memmove, but clang doesn't do that on its' own (yet).

mclow commented 6 years ago

And here is that naive copy implementation, suitable for pasting into compiler explorer:

#include <iterator>

template <class _InputIterator, class _OutputIterator>
inline
_OutputIterator
mycopy(_InputIterator __first, _InputIterator __last, _OutputIterator __result)
{
    for (; __first != __last; ++__first, (void) ++__result)
        *__result = *__first;
    return __result;
}

void copy2(int *pf, int *pl, int *d)
{
    mycopy(pf, pl, d);
}
chandlerc commented 6 years ago

LLVM still doesn't implement memmove formation.

Here is essentially the most beautiful IR possible, handily produced by Clang for libc++'s naive std::copy implementation:

target triple = "x86_64-unknown-linux-gnu"

define void @&#8203;bar(i32* %begin, i32* %end, i32* nocapture %dest) {
bb:
  %cmp1 = icmp eq i32* %begin, %end
  br i1 %cmp1, label %exit, label %loop.ph

loop.ph:
  br label %loop

loop:
  %dest.i = phi i32* [ %dest.next, %loop ], [ %dest, %loop.ph ]
  %begin.i = phi i32* [ %begin.next, %loop ], [ %begin, %loop.ph ]
  %data = load i32, i32* %begin.i, align 4
  store i32 %data, i32* %dest.i, align 4
  %begin.next = getelementptr inbounds i32, i32* %begin.i, i64 1
  %dest.next = getelementptr inbounds i32, i32* %dest.i, i64 1
  %cmp2 = icmp eq i32* %begin.next, %end
  br i1 %cmp2, label %loop.exit, label %loop

loop.exit:
  br label %exit

exit:
  ret void
}