Open eugenis opened 8 years ago
Applied to release_38 as 261341.
The proper fix can come slow later on, but will probably not be a candidate for future 3.8.x releases.
I've committed the fix in r261331. Hans, could you port this to the release branch please?
I have a fix in http://reviews.llvm.org/D17332
I can have a fix out for this tomorrow if that's not too late.
Tomorrow sounds great. Thanks!
I had a look at the output IR and the problem is that the %wide.vec load is not safe. This is because when we do the interleaved vectorization for loads, we speculate gaps in the group and assume this is safe. This is clearly not correct, and we need to do some further validation.
I can think of two ways to fix this: 1) Be conservative and only vectorize if we have a full group (we already do this for stores). 2) Make sure that we have the first and last access in the group
My preference is towards 2).
Mine too.
I had a look at the output IR and the problem is that the %wide.vec load is not safe. This is because when we do the interleaved vectorization for loads, we speculate gaps in the group and assume this is safe. This is clearly not correct, and we need to do some further validation.
I can think of two ways to fix this: 1) Be conservative and only vectorize if we have a full group (we already do this for stores). 2) Make sure that we have the first and last access in the group
My preference is towards 2).
I can have a fix out for this tomorrow if that's not too late.
FWIW this also affects all targets that have interleaved access vectorization turned on by default (which are ARM, AArch64 and PowerPC).
Bisect points to this commit: http://llvm.org/viewvc/llvm-project?rev=246541&view=rev
That one's in 3.8.
Renato: do you think this is something we'll want to revert on the branch unless a fix shows up quickly?
Bisect points to this commit: http://llvm.org/viewvc/llvm-project?rev=246541&view=rev
assigned to @sbaranga-arm
Extended Description
See C++ test case below. When built with -O2, this crashes on Android/ARM due to a 4-byte right OOB access (caught by ASan, btw).
$ bin/clang++ 1.cc --sysroot $TOOLCHAIN/sysroot/ -B$TOOLCHAIN -target armv7-linux-android -O2 -fPIC -pie
include
include <sys/mman.h>
include
struct A { int a, b; };
struct S { char pad[4096 - sizeof(A) * 20]; A a[20]; };
attribute((noinline)) int f(A *a, int len) { int sum = 0; for (int i = 0; i < len; ++i) sum += a[i].b; return sum; }
int main() { char p = (char )mmap(0, 4096 2, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); mprotect(p + 4096, 4096, PROT_NONE); S s = new(p) S(); int n = f(s->a, sizeof(s->a) / sizeof(s->a[0])); fprintf(stderr, "done %d\n", n); }
Vectorized code (attached) does 32-byte loads starting at a[0].b, a[4].b, a[8].b, a[12].b, a[16].b
The last one overflows 4 bytes to the right.