Open dmpots opened 11 months ago
I feel like this is a problem also related to https://github.com/microsoft/DirectXShaderCompiler/issues/5302#issuecomment-1595879233
The workaround we use for the vertex shaders, we also had to keep it for the compute shaders because we have hit this exact problem even if there was a dx.break
added.
The dx.break code emission seems to be based on pattern recognition and you don't need to do much changes to have things moved out of the loop.
Just switching from while(true) ... break;
to while(!isDone) ... isDone = true:
is enough to prevent the dx.break workaround from being emitted.
I was waiting to see progress on that other issue and the new year before commenting more on what I discovered on the wave intrinsics issues.
I think, the flaw is fundamentally that the compiler doesn't know there is more than one thread executing at a time in lockstep with those intrinsics. The returned value from the wave intrinsics should be considered volatile.
If WaveReadLaneFirst(...)
and WaveIsFirstLane()
were volatile, the problems would probably go away. I was still trying to wrap my head around the DXC code to try it when I went on vacation.
It would actually be the correct semantic because the compiler can't assume, like it seems to currently do, that the same call to the same wave intrinsic with the same parameter(s) will give the same result.
The problem with that approach is that the compiler won't be able to correctly identify dead code using wave intrinsics. I'd still prefer that issue to the current one where things get moved out of the loop or the loop gets completely removed.
I hope my posts helps in identifying the problem.
Description The output of a wave intrinsic depends on the current state of which threads are active. When a wave intrinsic occurs in a loop it needs to execute in the loop body since the state of active threads changes over time (threads can exit the loop at different times). The shader in this bug shows a case where dxc is incorrectly moving wave intrinsics and their dependent operations out of a loop.
In this shader the
InterlockedOr
operation is only done once outside the loop, but it should be done inside the loop for each unique value ofarrayIndex
.The bug only repros with lifetime markers enabled. If I compile with them off (either by default in the shader model or explicitly with a flag) the code looks correct. My guess is maybe jump-threading is interfering with the
dx.break
calls working as expected. Note that we also get correct code with partial lifetime markers-opt-enable partial-lifetime-markers
.Steps to Reproduce
Lifetime markers on
dxc /Tps_6_6 bug.hlsl
dxc /Tps_6_5 -enable-lifetime-markers bug.hlsl
Lifetime markers off
dxc /Tps_6_5 bug.hlsl
dxc /Tps_6_6 -disable-lifetime-markers bug.hlsl
Actual Behavior The CFG with lifetime markers on shows the atomic operation is moved out of the loop.
Environment
dxcompiler.dll: 1.8 - 1.7.0.4383 (ad3958c2d); dxil.dll: 1.6(101.6.2104.33)
Windows