Open amaiorano opened 1 year ago
I'll let the DXIL team comment on whether this is a bug in the code or the compiler. However, the issues seems to be that the coordinate in the RWByteAddressBuffer load is not be interpreted as an integer vector.
tint_symbol.Load((0).xx, tint_symbol_7) // Fails because the compiler thinks that (0).xx is the wrong type.
tint_symbol.Load((0u).xx, tint_symbol_7) // Passes
This should give you a work around for now.
Thanks @s-perron, I just got back to this bug, and it definitely looks like a compiler bug. Similar to what you noted above, I also found that:
results.Store((4u * uint(i)), asuint(texture0.Load((0).xx, i).x)); // Fails
results.Store((4u * uint(i)), asuint(texture0.Load(int2(0, 0), i).x)); // Works
Why would int2(0,0)
be different from (0).xx
? As @s-perron pointed out, it looks like the compiler isn't determining the correct type for (0).xx
. Making it (0u).xx
forces it to be an integral type. In fact, we can explicitly cast to int, and it compiles properly: int(0).xx
.
Adding some more information, the failure seems to only happen when the result is assigned to a buffer. I simplified the example:
Texture2DMS<float4> texture0 : register(t0);
RWByteAddressBuffer results : register(u2);
[numthreads(1, 1, 1)]
void main() {
results.Store(0u, texture0.Load((0).xx, 0).x); // Fails to compile
const float r = texture0.Load((0).xx, 0).x; // Succeeds
results.Store(0u, r); // Fails
}
Spent some more time debugging this in DXC, and here's what I've found:
Given this shader:
Texture2D<float4> texture0 : register(t0);
RWByteAddressBuffer results : register(u2);
[numthreads(1, 1, 1)]
void main() {
results.Store(0, asuint(texture0.Load(0.xxx).x));
}
Compiling this with a debug (asserts enabled) build of DXC, we trip the following assertion:
Error: assert((i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!")
File:
C:\src\external\DirectXShaderCompiler\lib\IR\Instructions.cpp(239)
This corresponds to the following code:
void CallInst::init(FunctionType *FTy, Value *Func, ArrayRef<Value *> Args,
const Twine &NameStr) {
...
#ifndef NDEBUG
...
for (unsigned i = 0; i != Args.size(); ++i)
assert((i >= FTy->getNumParams() ||
FTy->getParamType(i) == Args[i]->getType()) &&
"Calling a function with a bad signature!");
#endif
...
}
Debugging further, the assertion happens at index 3, which is the index of the first coordinate value being passed in to the intrinsic. This corresponds to the first element of 0.xxx
, and the type of this value is a 64-bit integer in Args[3], but the texture load function type for that arg is expected to be a 32-bit integer.
Surprisingly, if we modify the code slightly to use an explicit int64 vector of constants, rather than a swizzle of a 64-bit constant, DXC is able to compile it just fine:
Texture2D<float4> texture0 : register(t0);
RWByteAddressBuffer results : register(u2);
[numthreads(1, 1, 1)]
void main() {
// results.Store(0, asuint(texture0.Load(0.xxx).x));
results.Store(0, asuint(texture0.Load(int64_t3(0, 0, 0)).x)); // This works
}
When debugging the same code, the type of Arg[3] is now a 32-bit integer, which matches the target type of the function parameter (so the assertion doesn't trip). From this, it looks like int64_t3(0, 0, 0)
is somehow getting converted to a vec3 of int32s at some point, whereas 0.xxx
is not.
Hopefully this helps to figure out what the bug is.
I just found out that @ben-clayton actually reported this issue a while ago: https://github.com/microsoft/DirectXShaderCompiler/issues/5082
Also want to add that we've found the same issue occurs with function calls on RWByteAddressBuffer
, not just on textures:
RWByteAddressBuffer s : register(u0);
[numthreads(1, 1, 1)]
void main() {
s.Store(0, asuint((-1).x)); // fail
s.Store(0, asuint(int((-1).x))); // pass
s.Store2(0, asuint((-1).xx)); // fail
s.Store2(0, asuint(int2((-1).xx))); // pass
s.Store3(0, asuint((-1).xxx)); // fail
s.Store3(0, asuint(int3((-1).xxx))); // pass
}
Also note that unlike for texture calls, for storage buffer calls, if the splatted value is 0
, it compiles fine, so it needs to be non-zero (e.g. 1.xxx
, etc).
Revisiting this bug, as we hit yet another case of it. Here's another simple example of it:
RWByteAddressBuffer sb : register(u0);
[numthreads(1, 1, 1)]
void main() {
sb.Store(0, asuint(int2(123, 123))); // Okay
sb.Store(0, asuint((123).xx)); // Boom
}
The IR for each line:
sb.Store(0, asuint(int2(123, 123))); // Okay
:
%0 = call <2 x i32> @"dx.hl.op.rn.<2 x i32> (i32, <2 x i32>)"(i32 114, <2 x i32> <i32 123, i32 123>), !dbg !3
sb.Store(0, asuint((123).xx)); // Boom
:
%0 = call <2 x i32> @"dx.hl.op.rn.<2 x i32> (i32, <2 x i64>)"(i32 114, <2 x i64> <i64 123, i64 123>), !dbg !3
Basically, it looks like the problem is that swizzles of constant integrals become 64-bit values (scalars or vectors), but asuint(x)
maps to a bitcast to a 32-bit value.
Specifically, TranslateBuiltinIntrinsic
gets the opcode from call i32 @"dx.hl.op.rn.i32
that maps to TranslateAsUint
, and that function will then call TranslateBitcast
(for the form that takes only 2 operands).
So the bug seems to be one of:
asuint
should handle 64-bit values and map these to a truncate rather than a bitcast.asuint
.I'm guessing 3 is the right thing to do?
A little more info, I think the problem really is down to the as
casts. I can reproduce this for all three as
casts with:
RWByteAddressBuffer sb : register(u0);
void f(float a) {
sb.Store(0, a);
}
[numthreads(1, 1, 1)]
void main() {
float b = asuint((123).x);
f(b);
float c = asint((123).x);
f(c);
float d = asfloat((123).x);
f(d);
}
I added the f
function to try and show that it's not related to the buffer store call, but I need that call to make sure this doesn't get optimized out first.
This is resolved in HLSL 202x: https://godbolt.org/z/vdnde81co
Note that this case doesn't repro on the latest DXC https://godbolt.org/z/TK15ajfaa.
This is resolved in HLSL 202x: https://godbolt.org/z/vdnde81co
Note that this case doesn't repro on the latest DXC https://godbolt.org/z/TK15ajfaa.
I would not consider this resolved because it works in HLSL 202x. This is a bug in the compiler for the non-202x path that should be fixed.
Moving this to dormant - this is a fundamental issue with the pre-HLSL202x language involving literals. Our fix for this is the language change that's in HLSL 202x.
If someone wants to attempt to fix this specific codegen / assert issue then we'd consider reviewing and accepting it.
Description Compiling a compute shader that we generate from Dawn (Chromium WebGPU implementation) fails with "Invalid record".
Steps to Reproduce
Given the following compute shader:
When compiling with DXC:
Note that the warning is not important. In fact, commenting out line 11 still produces and error. However, commenting out line 10, DXC manages to compile correctly.
Actual Behavior The shader should compile.
Environment