microsoft / DirectXShaderCompiler

This repo hosts the source for the DirectX Shader Compiler which is based on LLVM/Clang.
Other
3.08k stars 689 forks source link

[SPIR-V] Don't lower switch statements to OpSwitch #6957

Open llvm-beanz opened 2 weeks ago

llvm-beanz commented 2 weeks ago

Description OpSwitch even as defined with SPV_KHR_maximal_reconvergence, doesn't require converging on switch cases that have fall through.

The only way to get SPIR-V to implement switch statements that converge correctly is with OpBranch instead.

Steps to Reproduce

Given the following HLSL:

RWBuffer<int> value;

[numthreads(4, 1, 1)]
void main(uint3 threadID : SV_DispatchThreadID) {
  uint sum = 0;
  switch (value[threadID.x]) {
    case 0:
      sum += WaveActiveSum(1);
    default:
      sum += WaveActiveSum(10);
      break;
  }
  value[threadID.x] = sum;
}

CE

If given the input [ 0, 0, 1, 2], the computed output should be [ 42, 42, 40, 40 ].

Actual Behavior

Even with the KHR maximal reconvergence extension the OpSwitch is not guaranteed to converge the tangles between case 0 and the default case.

However, if instead these were generated as a chain of OpBranch statements, the control flow would converge at each new OpBranch, which would result in the correct tangle grouping.

Environment

tex3d commented 1 week ago

HLSL is not supposed to support fall-through.

FXC produces errors if you try to fall through:

error X3533: non-empty case statements must have break or return error X3537: Fall-throughs in switch statements are not allowed.

Allowing it presents problems with keeping control flow structured, which was why it was disallowed originally in HLSL.

I'm of the opinion that we should disallow fall-through in DXC and HLSL on Clang. While we could consider this an HLSL 202x change, I consider it a bug that DXC allowed it in the first place, especially since using it could lead to undefined behavior or driver/runtime bugs.

llvm-beanz commented 1 week ago

I'm of the opinion that we should disallow fall-through in DXC and HLSL on Clang. While we could consider this an HLSL 202x change, I consider it a bug that DXC allowed it in the first place, especially since using it could lead to undefined behavior or driver/runtime bugs.

Given that we know users are relying on this feature working as implemented in DXC (they've filed bugs about it), and the fact that HLSL has no specification (yet). The language is defined by the behavior of the reference compiler, which today is DXC.

HLSL 2016+ all currently support switch fall through. Changing that, is a breaking language change. Which isn't to say we shouldn't do it, but it is orthogonal to this issue, because this issue is that the SPIR-V backend generates code with undefined behavior in a place where the behavior should be well-defined.

s-perron commented 8 hours ago

I agree with @llvm-beanz. Users have been using fallthroughs for so long that they will expect to be able to continue going forward.

After some discussion, my team will not be fixing this in DXC. However, we would be willing to accept an fix. My suggested fix is to write a pass in spirv-opt that will transform all OpSwitch into code that is well defined. Trying to generate the correct code in DXC would be too cumbersome.

The spirv-opt pass would have a couple tricky situations in order to avoid breaking structured control flow. Suppose you have something like:

switch(selector) {
  default:
    something();
  case 1:
  case 3:
    somethingElse();
    break;
  case 2:
    anotherThing();
    break;
}

The regular way of translating this would be:

if (selector == 1 || selector == 3) {
case1:
  somethingElse();
  goto merge;
} else if (selector == 2) {
  anotherThing();
  goto merge;
} else {
  // default
  something();
  goto case1;
}
merge:

This does not work for spir-v because the gotos violate structure control flow. Instead we would want something like:

// We want to keep a switch so that we can break to the end when we want. This is well defined since every thread has the same selector.
switch(0) {
  default: {
      // Determine which case block we want to start with. 
      // Note that the result is well defined because there is no cross wave communication.
      uint case_id;
      switch(selector) {
        default:
          case_id = 0
          break;
        case 1:
        case 3:
          case_id = 1;
          break;
        case 2:
          case_id = 2;
          break;
    }
    if (case_id == 0) {
      something();
      case_id = 1; // This case falls through so we now need to execute case 1.
    }
    if (case_id == 1) {
      somethingElse();
      break; // breaks can still be breaks;
    }
    if (case_id == 2) {
      anotherThing();
      break;
    }
  }
}

This example shows most of the tricky situation:

  1. The default case is not always the last case.
  2. Branches from breaks and fall-through need to be handled carefully to avoid breaking structured control flow.
  3. Multiple selector values need to branch to the same case, and those cases cannot diverge.

I would do a few things to make it simpler:

  1. Assume that ssa values are either defined globally or are not live across basic block boundaries. That way you don't have to worry about adding or changing OpPhi instructions, which can get messy.
  2. Only rewrite switches if the selector is not an immediate value, and it contains a group operation. This will avoid rewriting the switches created by the pass, which we know are fine.