[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation

vchuravy commented 1 month ago

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487

The code is a double-nested loops and the bug manifests as if we were skipping a loop. In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-amdgpu

Author: Valentin Churavy (vchuravy)

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487 The code is a double-nested loops and the bug manifests as if we were skipping a loop. In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order. I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version. https://godbolt.org/z/sscdTK7d7 https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399 ``` llc -filetype=asm broken.reorder.ll -o broken.reorder.S llc -filetype=asm broken.ll -o broken.S ``` `broken.ll` is the original file that is exhibiting the miscompiation and `broken.reorder.ll` is the file that emits the same code as the working MWE. Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

llvmbot commented 1 month ago

@llvm/issue-subscribers-julialang

Author: Valentin Churavy (vchuravy)

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487 The code is a double-nested loops and the bug manifests as if we were skipping a loop. In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order. I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version. https://godbolt.org/z/sscdTK7d7 https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399 ``` llc -filetype=asm broken.reorder.ll -o broken.reorder.S llc -filetype=asm broken.ll -o broken.S ``` `broken.ll` is the original file that is exhibiting the miscompiation and `broken.reorder.ll` is the file that emits the same code as the working MWE. Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

arsenm commented 1 month ago

Does this reproduce with main? I see no difference in 17.0 or later: https://godbolt.org/z/Es9jdoYsd

I'm guessing this is a gfx11 specific issue. Newer versions don't have the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) at the end, so this might just be one of the early deallocation bugs that was fixed.

cc @jayfoad

vchuravy commented 1 month ago

The original reporter had a gfx1102 and I reproduced it on a gfx1103.

As far as I can tell this doesn't reproduce on main. I hadn't had the time to bisect. This is on an LTS release for us, and I would like to Backports the fix if possible. So if you have a hint which commit might have fixed this that would be great!

arsenm commented 1 month ago

eb7491769a511476ffae596a8f58abfd12ed46d4 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)

jayfoad commented 1 month ago

eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)

The reason for reimplementing it was as a basis for implementing this fairly important bug fix: 4b6d41cd1d73a72403679ec21732df3610fc0964

But from reading the description above ("the bug manifests as if we were skipping a loop") it's not clear to me how this is related to dealloc vgprs.

arsenm commented 1 month ago

it's not clear to me how this is related to dealloc vgprs.

I just noticed in the newer versions, there is no more dealloc. The only difference between the pass/fail in the 16 diff was a few register assignments and one s_delay_alu

llvm / llvm-project

[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation #109391