iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.56k stars 568 forks source link

WebGPU compilation error: multiple variables use the same resource binding #11054

Open ScottTodd opened 1 year ago

ScottTodd commented 1 year ago

What happened?

I'm trying to compile mobilebertsquad.tflite.mlir from our benchmark suites for WebGPU and see this error:

error: entry point 'd0' references multiple variables that use the same resource binding @group(0), @binding(1)

Here are the relevant .mlir and .spvasm files: https://gist.github.com/ScottTodd/6af59563826719340f659dd0554edd40

Steps to reproduce your issue

  1. Download mobilebertsquad.tflite.mlir from our benchmark infra. I downloaded files from Oct 28, so maybe around https://buildkite.com/iree/iree-benchmark-android/builds/4621
  2. Run iree-compile mobilebertsquad.tflite.mlir --iree-hal-target-backends=webgpu --iree-input-type=tosa
  3. Observe the error

What component(s) does this issue relate to?

Compiler

Version information

webgpu branch, https://github.com/iree-org/iree/commit/4116d5aa57452fc9b5f1695f62188cd054af4cf1

Additional context

I was able to compile this using an older version of IREE's compiler, so this might be a regression? With the version that I compiled, I saw this runtime error: Tint WGSL reader failure: :44:211 error: value cannot be represented as 'f32' x__resource_var_0_1_.field0[(x_83 + select(x_131, (4294967295u - x_131), x_128))] = select(select(select(select(select(vec4<f32>(1.17549435e-38f, 1.17549435e-38f, 1.17549435e-38f, 1.17549435e-38f), vec4<f32>(0x1p+128f, 0x1p+128f, 0x1p+128f, 0x1p+128f), (x_99 > vec4<f32>(0.0f, 0.0f, 0.0f, 0.0f))), (fma(fma(vec4<f32>(0.0131435f, 0.0131435f, 0.0131435f, 0.0131435f), x_104, vec4<f32>(0.03668965f, 0.03668965f, 0.03668965f, 0.03668965f)), (x_105 * x_105), fma(fma(vec4<f32>(0.168738902f, 0.168738902f, 0.168738902f, 0.168738902f), x_104, vec4<f32>(0.499705136f, 0.499705136f, 0.499705136f, 0.499705136f)), x_105, fma(vec4<f32>(1.0f, 1.0f, 1.0f, 1.0f), x_104, vec4<f32>(1.0f, 1.0f, 1.0f, 1.0f)))) * bitcast<vec4<f32>>(((x_112 + vec4<u32>(127u, 127u, 127u, 127u)) << vec4<u32>(23u, 23u, 23u, 23u)))), ((bitcast<vec4<i32>>(x_112) <= bitcast<vec4<i32>>(vec4<u32>(127u, 127u, 127u, 127u))) & (bitcast<vec4<i32>>(x_112) >= bitcast<vec4<i32>>(vec4<u32>(4294967169u, 4294967169u, 4294967169u, 4294967169u))))), vec4<f32>(0x1p+128f, 0x1p+128f, 0x1p+128f, 0x1p+128f), (x_99 == vec4<f32>(0x1p+128f, 0x1p+128f, 0x1p+128f, 0x1p+128f))), vec4<f32>(0.0f, 0.0f, 0.0f, 0.0f), (x_99 == vec4<f32>(-0x1p+128f, -0x1p+128f, -0x1p+128f, -0x1p+128f))), x_99, vec4<bool>(false, false, false, false));

benvanik commented 1 year ago

This may be the pass @antiagainst enabled on all but apple the other day? (may need to add a && !webgpu)

antiagainst commented 1 year ago

Ha, okay, this is hitting the same aliased resources issue that we have been fighting when targeting Apple platforms via MoltenVK.

SPIR-V spec allows multiple resources bound to the same (set, binding) slot; real GPU drivers are all fine with that AFAICT. However SPIRV-Cross does not support aliased resources too. So I wrote a UnifyAliasedResourcePass (code and test) to try to do various type casting and re-indexing to eliminate those aliased resources. However as recently I found there is a correctness issue with that pass so I limited it to only Apple platforms for now. I will spend some time later to figure that out and fix it so we can reuse here.

Better, it would be nice if Tint can support such cases. @dneto0 as FYI.

dneto0 commented 1 year ago

WGSL does not allow aliasing (group,binding). https://gpuweb.github.io/gpuweb/wgsl/#resource-interface

It does allow the same subresource (buffer or texture layer) to be bound to different (group,binding) points. But it's a "dynamic error" if at least one of them is writable. There's no way to synchronize those within the same shader, not even in Vulkan. (You would need a "domain" operation but that only occurs with API-side actions, and there is no way to invoke one inside the shader).

antiagainst commented 1 year ago

11302 enables the UnifyAliasedResource pass for WebGPU so various model should compile now I believe. As said in the previous comment that there is a bug in the pass that I still need to figure out. Though that should only affect non-32 bit cases I believe; as 32-bit float cases are already used by various models running on Vulkan via MoltenVK.

ScottTodd commented 1 year ago

Confirmed, this fixes compiling mobilebertsquad.tflite.mlir 🎉

(there is still an issue with push constant emulation and Uniform storage, but this aliasing issue is fixed)

ScottTodd commented 1 year ago

This might have regressed at some point. With this test program: https://github.com/openxla/iree/blob/4efb89474821cbce40e40e5b5502d027a91688ec/experimental/web/sample_webgpu/multiple_results.mlir#L1-L8

I see these errors at https://github.com/openxla/iree/commit/1a0b3fd28eb72558662e191a24d9767a679c8c16:

// ./experimental/web/sample_webgpu/build_sample.sh
//
//  Compiling 'multiple_results' sample for WebGPU...
// Tint reported 1 error(s) for a SPIR-V program, see diagnostics:
// error: entry point 'd0' references multiple variables that use the same resource binding @group(0), @binding(2)
// note: first resource binding usage declared here
// iree/experimental/web/sample_webgpu/multiple_results.mlir:6:15: error: failed to compile SPIR-V to WGSL. Consider inspecting the shader program using -iree-hal-dump-executable-intermediates.
  %result_1 = math.absf %input_1 : tensor<f32>
              ^
// iree/experimental/web/sample_webgpu/multiple_results.mlir:1:1: note: called from
func.func @multiple_results(
^
// iree/experimental/web/sample_webgpu/multiple_results.mlir:6:15: note: see // current operation: 
"hal.executable.variant"() ({
  "hal.executable.export"() ({
  ^bb0(%arg0: !hal.device):
    %0 = "arith.constant"() <{value = 1 : index}> : () -> index
    "hal.return"(%0, %0, %0) : (index, index, index) -> ()
  }) {layout = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>]>]>, ordinal = 0 : index, sym_name = "d0", translation_info = #iree_codegen.translation_info<SPIRVBaseLowering>, workgroup_size = [1 : index, 1 : index, 1 : index]} : () -> ()
  "builtin.module"() ({
    "spirv.module"() <{addressing_model = #spirv.addressing_model<Logical>, memory_model = #spirv.memory_model<GLSL450>, vce_triple = #spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>}> ({
      "spirv.GlobalVariable"() <{binding = 0 : i32, descriptor_set = 0 : i32, sym_name = "__resource_var_0_0_", type = !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>}> : () -> ()
      "spirv.GlobalVariable"() <{binding = 1 : i32, descriptor_set = 0 : i32, sym_name = "__resource_var_0_1_", type = !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>}> : () -> ()
      "spirv.GlobalVariable"() <{binding = 2 : i32, descriptor_set = 0 : i32, sym_name = "__resource_var_0_2__0", type = !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>}> {aliased} : () -> ()
      "spirv.GlobalVariable"() <{binding = 2 : i32, descriptor_set = 0 : i32, sym_name = "__resource_var_0_2_", type = !spirv.ptr<!spirv.struct<(!spirv.rtarray<f32, stride=4> [0])>, StorageBuffer>}> {aliased} : () -> ()
      "spirv.func"() <{function_control = #spirv.function_control<None>, function_type = () -> (), sym_name = "d0"}> ({
        %0 = "spirv.Constant"() <{value = 16 : i32}> : () -> i32
        %1 = "spirv.Constant"() <{value = 0 : i32}> : () -> i32
        %2 = "spirv.mlir.addressof"() <{variable = @__resource_var_0_0_}> : () -> !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>
        %3 = "spirv.mlir.addressof"() <{variable = @__resource_var_0_1_}> : () -> !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>
        %4 = "spirv.mlir.addressof"() <{variable = @__resource_var_0_2__0}> : () -> !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>
        %5 = "spirv.mlir.addressof"() <{variable = @__resource_var_0_2_}> : () -> !spirv.ptr<!spirv.struct<(!spirv.rtarray<f32, stride=4> [0])>, StorageBuffer>
        %6 = "spirv.AccessChain"(%2, %1, %1) : (!spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f32, StorageBuffer>
        %7 = "spirv.Load"(%6) : (!spirv.ptr<f32, StorageBuffer>) -> f32
        %8 = "spirv.GL.FAbs"(%7) : (f32) -> f32
        %9 = "spirv.AccessChain"(%5, %1, %0) : (!spirv.ptr<!spirv.struct<(!spirv.rtarray<f32, stride=4> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f32, StorageBuffer>
        "spirv.Store"(%9, %8) : (!spirv.ptr<f32, StorageBuffer>, f32) -> ()
        %10 = "spirv.AccessChain"(%3, %1, %1) : (!spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f32, StorageBuffer>
        %11 = "spirv.Load"(%10) : (!spirv.ptr<f32, StorageBuffer>) -> f32
        %12 = "spirv.GL.FAbs"(%11) : (f32) -> f32
        %13 = "spirv.AccessChain"(%4, %1, %1) : (!spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f32, StorageBuffer>
        "spirv.Store"(%13, %12) : (!spirv.ptr<f32, StorageBuffer>, f32) -> ()
        "spirv.Return"() : () -> ()
      }) : () -> ()
      "spirv.EntryPoint"() <{execution_model = #spirv.execution_model<GLCompute>, fn = @d0, interface = []}> : () -> ()
      "spirv.ExecutionMode"() <{execution_mode = #spirv.execution_mode<LocalSize>, fn = @d0, values = [1 : i32, 1 : i32, 1 : i32]}> : () -> ()
    }) : () -> ()
  }) {spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, api=WebGPU, #spirv.resource_limits<>>} : () -> ()
  "hal.executable.variant_end"() : () -> ()
}) {sym_name = "webgpu_wgsl_fb", target = #hal.executable.target<"webgpu", "webgpu-wgsl-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, api=WebGPU, #spirv.resource_limits<>>}>} : () -> ()
  %result_1 = math.absf %input_1 : tensor<f32>

I've been wanting to add more test cases for various program structures (multiple inputs, multiple outputs, control flow, etc.)... wish I did that before this regressed :upside_down_face:

ScottTodd commented 1 year ago

From running a bisect, it looks like that test program regressed with https://github.com/openxla/iree/commit/3459833ae9c3f0fefe33ff08df7b99359d0c26b4 (https://github.com/openxla/iree/pull/13711). I'll dig in further... maybe the workaround for this issue doesn't apply for small/scalar computations after that change.

// multiple_results.mlir
func.func @multiple_results(
    %input_0 : tensor<f32>,
    %input_1 : tensor<f32>
) -> (tensor<f32>, tensor<f32>) {
  %result_0 = math.absf %input_0 : tensor<f32>
  %result_1 = math.absf %input_1 : tensor<f32>
  return %result_0, %result_1 : tensor<f32>, tensor<f32>
}
../iree-build/tools/iree-compile \
    multiple_results.mlir \
    --iree-hal-target-backends=webgpu \
    --iree-codegen-gpu-native-math-precision=true \
    --iree-stream-resource-alias-mutable-bindings=true \
    -o /dev/null