llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.99k stars 11.06k forks source link

Possible vectorization or CodeGen bug #28509

Open pirama-arumuga-nainar opened 8 years ago

pirama-arumuga-nainar commented 8 years ago
Bugzilla Link 28135
Version trunk
OS All
Blocks llvm/llvm-project#21794
Attachments Simplified function for possible vectorization bug
CC @Arnaud-de-Grandmaison-ARM,@hfinkel,@kbeyls,@rengolin,@sbaranga-arm,@rotateright,@stephenhines

Extended Description

[Tentatively marking "loop optimization" component this repros in both AArch64 and ARM targets]

This relates to the Android issue mentioned in https://android-review.googlesource.com/#/c/235875/.

The function 'loop' in the attached 'codec-loop.c' function produces incorrect output when the 'for' loop in it is vectorized and called from that source file (https://android-review.googlesource.com/#/c/235875/2/media/libstagefright/codecs/amrwbenc/src/voAMRWBEnc.c). The incorrect output causes corruption in the codec's output.

The failure reproduces on both AArch64 and ARM32. Various workarounds fix the failure (adding prints in the loop body, turning on integer sanitization etc) but they all succeed only because vectorization and vector instructions don't get used.

I am trying to find the right set of inputs that trigger the issue in the reduced "loop" functiono. I'll post it here if I am successful.

PS: I tested this on Android's Clang, which is a bit behind trunk. However, LLVM IR from both ToT and Android's clang was the same, except for very minor differences).

fba5f26d-3e2e-4f9e-8aef-b87df32319ac commented 7 years ago

Reproduce with opt -O3 bug.ll -S -o - I think the attached bug.ll file should reproduces the issue. The problem I'm seeing is that after codegen we're doing the vector load from a loop-invariant location on the stack. The buggy transformation appears to happen somewhere in Selection DAG.

Can be reproduced with "llc -O3 bug.ll -o -" on LLVM ToT.

pirama-arumuga-nainar commented 8 years ago

Here's my findings so far. Please refer to https://android-review.googlesource.com/#/c/235875 for the actual sources. Repro steps are at the end of this comment.

Function 'coder' in voAMRWBEnc.c has a loop that is miscompiled. Miscompilation is detected by comparing against the output of non-vectorized loop processing the same data. (See line 1254 onwards in voAMRWBEnc.c)

I've extracted the loop out to function 'loop_incorrect' in the same file. I also copied the function to a separate file (loop_correct.c). Calling loop_incorrect causes the output to be different than the non-vectorized loop. OTOH calling loop_correct from 'coder' passes (i.e. gives the same result as the non-vectorized loop).

I've included additional information in https://android-review.googlesource.com/#/c/235875/. loop_correct.ll and loop_incorrect.ll have the LLVM IR for the two functions. One notable difference between the two is that the inlined-version of L_shl2 gets compiled differently. AFAICT, the two differences are sematically equivalent. (cmd-loop_incorrect.sh and cmd-loop_correct.sh can be used from root of AOSP tree to compile the sources and emit .bc files).

I've also added another file, standalone_main.c that calls loop_incorrect and loop_correct from a standalone executable on an input that fails when called within coder. Unfortunately, the two functions produce identical output as the non-vectorized loop.

It is remotely possible that when called from standalone_main.c, the checks for array overlaps at the top of the vectorized loop fail and we fall back to the scalar loop but I can't imagine why this should happen.

=================================== Repro steps:

Run the following for an Android device:

Apply https://android-review.googlesource.com/#/c/235875/ and rebuild: $ cd $ANDROID_BUILD_TOP/frameworks/av/media/libstagefright/codecs/amrwbenc $ git fetch https://android.googlesource.com/platform/frameworks/av refs/changes/75/235875/3 && git cherry-pick FETCH_HEAD $ mma # Add -jn for parallel build w/ n processes $ adb sync

Apply https://android-review.googlesource.com/#/c/235874/: $ cd $ANDROID_BUILD_TOP/cts $ git fetch https://android.googlesource.com/platform/cts refs/changes/74/235874/1 && git cherry-pick FETCH_HEAD $ m cts # Add -jn for parallel build w/ n processes

Run test: $ cts-tradefed run cts --k --class android.media.cts.EncoderTest --method 'testAMRWBEncoders' --force-abi 64

Follow 'adb logcat' output in another shell

$ adb logcat | grep MEDIA

Successful runs will print several "outputs were identical" while failed runs will log "different @..."

fba5f26d-3e2e-4f9e-8aef-b87df32319ac commented 8 years ago

I also tried to reproduce this by using LLVM ToT and (a lot of) randomly generated data for the loop. As far as I can see, the results of the vectorized version are the same as the non-vectorized version. I suspect there's either some alignment issue with the data that we're not seeing here or the vectorized version of the loop is correct but it somehow triggers a bug somewhere else?

The IR also looks correct after vectorization (at least as far as I can tell).

The best way to progress this seems to be to investigate the source.

rengolin commented 8 years ago

Renato, your concern is valid. Our plan is to fix this the right way before the next compiler update in Android. We'll either include the proper fix or use the pragma disabling vectorization if we are not able to fix this by then.

Sounds like a plan. :)

I'll reduce the AOSP patch (https://android-review.googlesource.com/#/c/235875/) and add an explanation here. Like Steve mentioned earlier, I've been unsuccessful in reproducing this in a standalone fashion.

I suspected as much. :(

pirama-arumuga-nainar commented 8 years ago

Renato, your concern is valid. Our plan is to fix this the right way before the next compiler update in Android. We'll either include the proper fix or use the pragma disabling vectorization if we are not able to fix this by then.

I'll reduce the AOSP patch (https://android-review.googlesource.com/#/c/235875/) and add an explanation here. Like Steve mentioned earlier, I've been unsuccessful in reproducing this in a standalone fashion.

rengolin commented 8 years ago

Sorry, my comment wasn't complete last time. We aren't hacking this to make the bug disappear. As the code is today, the bug no longer reproduces (because it is using safe shift operations instead of the original code that exhibited UB). The reason is that the new code doesn't generate NEON at all, and instead just generates a simple non-vectorized loop. Pirama found a way to get it to generate NEON in this case again (without UB), and that still exhibits the original bug (corrupt audio data). I will let him explain further about how to trigger this under the latest code.

An unrelated patch in the vectorizer to make that shift work would break it again... Regardless of the solution or current state, I still strongly recommend the pragma as a stop-gap work around.

stephenhines commented 8 years ago

Sorry, my comment wasn't complete last time. We aren't hacking this to make the bug disappear. As the code is today, the bug no longer reproduces (because it is using safe shift operations instead of the original code that exhibited UB). The reason is that the new code doesn't generate NEON at all, and instead just generates a simple non-vectorized loop. Pirama found a way to get it to generate NEON in this case again (without UB), and that still exhibits the original bug (corrupt audio data). I will let him explain further about how to trigger this under the latest code.

rengolin commented 8 years ago

and we can tweak small options in the source to make it disappear/reappear, so it might just be simpler to work from that direction.

Steve, I don't recommend you tweak the code to make it disappear, as that'll also be unstable and could catch you in the future.

The most sensible work around for now is to use the #pragma, and add a FIXME pointing to this bug.

rengolin commented 8 years ago

Yes, done.

Arnaud-de-Grandmaison-ARM commented 8 years ago

Should we link it to the llvm/llvm-project#21794 (Android meta bug) ?

stephenhines commented 8 years ago

Arnaud, this is the NEON bug that I mentioned on today's call. Note that the reduced case here doesn't quite work yet, but we can help with getting someone running this in AOSP if you like. It reproduces there 100% of the time, and we can tweak small options in the source to make it disappear/reappear, so it might just be simpler to work from that direction.

rengolin commented 8 years ago

Hi Pirama,

Thanks for the reproducible case. Though I'm at a loss of how to test this.

As we discussed earlier, it seems that the L_mult is the problem here, as if you change line:

L_var_out *= 2;

to

L_var_out>>= 1;

it works with NEON instructions, right?

As I said, in the interim, keep the pragma vectorize(disable) so we can fix this in the best way.

But for now, what we really need is a main function that will show different output with a training dataset. Looking at the IR may not give us much to go for, I'm afraid.