microsoft / DirectXShaderCompiler

This repo hosts the source for the DirectX Shader Compiler which is based on LLVM/Clang.
Other
3.12k stars 699 forks source link

`as` casts on integer constant swizzles result in invalid module bitcode (or assert in debug) #5389

Open amaiorano opened 1 year ago

amaiorano commented 1 year ago

Description Compiling a compute shader that we generate from Dawn (Chromium WebGPU implementation) fails with "Invalid record".

Steps to Reproduce

Given the following compute shader:

01: Texture2DMS<float4> tint_symbol : register(t0);
02: Texture2DMS<float4> tint_symbol_1 : register(t1);
03: 
04: RWByteAddressBuffer tint_symbol_5 : register(u2);
05: 
06: [numthreads(1, 1, 1)]
07: void tint_symbol_6() {
08:   {
09:     for(int tint_symbol_7 = 0; (tint_symbol_7 < 4); tint_symbol_7 = (tint_symbol_7 + 1)) {
10:       tint_symbol_5.Store((4u * uint(tint_symbol_7)), asuint(tint_symbol.Load((0).xx, tint_symbol_7).x));
11:       tint_symbol_5.Store((16u + (4u * uint(tint_symbol_7))), asuint(tint_symbol_1.Load(int3(0, 0, 0), tint_symbol_7).x));
12:     }
13:   }
14:   return;
15: }
16: 

When compiling with DXC:

C:\src\dawn>.\out\debug\dxc /T cs_6_0 /E tint_symbol_6 1976.hlsl
1976.hlsl:11:89: warning: implicit truncation of vector type [-Wconversion]
      tint_symbol_5.Store((16u + (4u * uint(tint_symbol_7))), asuint(tint_symbol_1.Load(int3(0, 0, 0), tint_symbol_7).x));
                                                                                        ^
error: validation errors
Invalid record
Validation failed.

Note that the warning is not important. In fact, commenting out line 11 still produces and error. However, commenting out line 10, DXC manages to compile correctly.

Actual Behavior The shader should compile.

Environment

s-perron commented 1 year ago

I'll let the DXIL team comment on whether this is a bug in the code or the compiler. However, the issues seems to be that the coordinate in the RWByteAddressBuffer load is not be interpreted as an integer vector.

tint_symbol.Load((0).xx, tint_symbol_7) // Fails because the compiler thinks that (0).xx is the wrong type.
tint_symbol.Load((0u).xx, tint_symbol_7) // Passes

This should give you a work around for now.

amaiorano commented 1 year ago

Thanks @s-perron, I just got back to this bug, and it definitely looks like a compiler bug. Similar to what you noted above, I also found that:

results.Store((4u * uint(i)), asuint(texture0.Load((0).xx, i).x)); // Fails
results.Store((4u * uint(i)), asuint(texture0.Load(int2(0, 0), i).x)); // Works

Why would int2(0,0) be different from (0).xx? As @s-perron pointed out, it looks like the compiler isn't determining the correct type for (0).xx. Making it (0u).xx forces it to be an integral type. In fact, we can explicitly cast to int, and it compiles properly: int(0).xx.

amaiorano commented 1 year ago

Adding some more information, the failure seems to only happen when the result is assigned to a buffer. I simplified the example:

Texture2DMS<float4> texture0 : register(t0);
RWByteAddressBuffer results : register(u2);

[numthreads(1, 1, 1)]
void main() {
  results.Store(0u, texture0.Load((0).xx, 0).x); // Fails to compile

  const float r = texture0.Load((0).xx, 0).x; // Succeeds
  results.Store(0u, r); // Fails
}
amaiorano commented 1 year ago

Spent some more time debugging this in DXC, and here's what I've found:

Given this shader:

Texture2D<float4> texture0 : register(t0);
RWByteAddressBuffer results : register(u2);
[numthreads(1, 1, 1)]
void main() {
    results.Store(0, asuint(texture0.Load(0.xxx).x));
}

Compiling this with a debug (asserts enabled) build of DXC, we trip the following assertion:

Error: assert((i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!")
File:
C:\src\external\DirectXShaderCompiler\lib\IR\Instructions.cpp(239)

This corresponds to the following code:

void CallInst::init(FunctionType *FTy, Value *Func, ArrayRef<Value *> Args,
                    const Twine &NameStr) {
...
#ifndef NDEBUG
...
  for (unsigned i = 0; i != Args.size(); ++i)
    assert((i >= FTy->getNumParams() || 
            FTy->getParamType(i) == Args[i]->getType()) &&
           "Calling a function with a bad signature!");
#endif
...
}

Debugging further, the assertion happens at index 3, which is the index of the first coordinate value being passed in to the intrinsic. This corresponds to the first element of 0.xxx, and the type of this value is a 64-bit integer in Args[3], but the texture load function type for that arg is expected to be a 32-bit integer.

Surprisingly, if we modify the code slightly to use an explicit int64 vector of constants, rather than a swizzle of a 64-bit constant, DXC is able to compile it just fine:

Texture2D<float4> texture0 : register(t0);
RWByteAddressBuffer results : register(u2);
[numthreads(1, 1, 1)]
void main() {
    // results.Store(0, asuint(texture0.Load(0.xxx).x));
    results.Store(0, asuint(texture0.Load(int64_t3(0, 0, 0)).x)); // This works
}

When debugging the same code, the type of Arg[3] is now a 32-bit integer, which matches the target type of the function parameter (so the assertion doesn't trip). From this, it looks like int64_t3(0, 0, 0) is somehow getting converted to a vec3 of int32s at some point, whereas 0.xxx is not.

Hopefully this helps to figure out what the bug is.

amaiorano commented 1 year ago

I just found out that @ben-clayton actually reported this issue a while ago: https://github.com/microsoft/DirectXShaderCompiler/issues/5082

amaiorano commented 1 year ago

Also want to add that we've found the same issue occurs with function calls on RWByteAddressBuffer, not just on textures:

RWByteAddressBuffer s : register(u0);
[numthreads(1, 1, 1)]
void main() {
  s.Store(0, asuint((-1).x)); // fail
  s.Store(0, asuint(int((-1).x))); // pass

  s.Store2(0, asuint((-1).xx)); // fail
  s.Store2(0, asuint(int2((-1).xx))); // pass

  s.Store3(0, asuint((-1).xxx)); // fail
  s.Store3(0, asuint(int3((-1).xxx))); // pass
}

Also note that unlike for texture calls, for storage buffer calls, if the splatted value is 0, it compiles fine, so it needs to be non-zero (e.g. 1.xxx, etc).

amaiorano commented 3 months ago

Revisiting this bug, as we hit yet another case of it. Here's another simple example of it:

RWByteAddressBuffer sb : register(u0);
[numthreads(1, 1, 1)]
void main() {
  sb.Store(0, asuint(int2(123, 123))); // Okay
  sb.Store(0, asuint((123).xx)); // Boom
}

The IR for each line: sb.Store(0, asuint(int2(123, 123))); // Okay:

  %0 = call <2 x i32> @"dx.hl.op.rn.<2 x i32> (i32, <2 x i32>)"(i32 114, <2 x i32> <i32 123, i32 123>), !dbg !3

sb.Store(0, asuint((123).xx)); // Boom:

  %0 = call <2 x i32> @"dx.hl.op.rn.<2 x i32> (i32, <2 x i64>)"(i32 114, <2 x i64> <i64 123, i64 123>), !dbg !3

Basically, it looks like the problem is that swizzles of constant integrals become 64-bit values (scalars or vectors), but asuint(x) maps to a bitcast to a 32-bit value.

Specifically, TranslateBuiltinIntrinsic gets the opcode from call i32 @"dx.hl.op.rn.i32 that maps to TranslateAsUint, and that function will then call TranslateBitcast (for the form that takes only 2 operands).

So the bug seems to be one of:

  1. That the swizzles should produce 32-bit scalars/vectors.
  2. That asuint should handle 64-bit values and map these to a truncate rather than a bitcast.
  3. ParseAST should insert an implicit cast to intN on constant integral swizzle arguments to asuint.

I'm guessing 3 is the right thing to do?

amaiorano commented 3 months ago

A little more info, I think the problem really is down to the as casts. I can reproduce this for all three as casts with:

RWByteAddressBuffer sb : register(u0);

void f(float a) {
  sb.Store(0, a);
}

[numthreads(1, 1, 1)]
void main() {
  float b = asuint((123).x);
  f(b);
  float c = asint((123).x);
  f(c);
  float d = asfloat((123).x);
  f(d);
}

I added the f function to try and show that it's not related to the buffer store call, but I need that call to make sure this doesn't get optimized out first.

damyanp commented 2 months ago

This is resolved in HLSL 202x: https://godbolt.org/z/vdnde81co

Note that this case doesn't repro on the latest DXC https://godbolt.org/z/TK15ajfaa.

amaiorano commented 2 months ago

This is resolved in HLSL 202x: https://godbolt.org/z/vdnde81co

Note that this case doesn't repro on the latest DXC https://godbolt.org/z/TK15ajfaa.

I would not consider this resolved because it works in HLSL 202x. This is a bug in the compiler for the non-202x path that should be fixed.

damyanp commented 2 months ago

Moving this to dormant - this is a fundamental issue with the pre-HLSL202x language involving literals. Our fix for this is the language change that's in HLSL 202x.

If someone wants to attempt to fix this specific codegen / assert issue then we'd consider reviewing and accepting it.