Open vchuravy opened 1 month ago
@llvm/issue-subscribers-backend-amdgpu
Author: Valentin Churavy (vchuravy)
@llvm/issue-subscribers-julialang
Author: Valentin Churavy (vchuravy)
Does this reproduce with main? I see no difference in 17.0 or later: https://godbolt.org/z/Es9jdoYsd
I'm guessing this is a gfx11 specific issue. Newer versions don't have the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) at the end, so this might just be one of the early deallocation bugs that was fixed.
cc @jayfoad
The original reporter had a gfx1102 and I reproduced it on a gfx1103.
As far as I can tell this doesn't reproduce on main. I hadn't had the time to bisect. This is on an LTS release for us, and I would like to Backports the fix if possible. So if you have a hint which commit might have fixed this that would be great!
eb7491769a511476ffae596a8f58abfd12ed46d4 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)
eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)
The reason for reimplementing it was as a basis for implementing this fairly important bug fix: 4b6d41cd1d73a72403679ec21732df3610fc0964
But from reading the description above ("the bug manifests as if we were skipping a loop") it's not clear to me how this is related to dealloc vgprs.
it's not clear to me how this is related to dealloc vgprs.
I just noticed in the newer versions, there is no more dealloc. The only difference between the pass/fail in the 16 diff was a few register assignments and one s_delay_alu
Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487
The code is a double-nested loops and the bug manifests as if we were skipping a loop. In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.
I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.
https://godbolt.org/z/sscdTK7d7
https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399
broken.ll
is the original file that is exhibiting the miscompiation andbroken.reorder.ll
is the file that emits the same code as the working MWE.Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.