Open ScottTodd opened 1 year ago
This may be the pass @antiagainst enabled on all but apple the other day? (may need to add a && !webgpu
)
Ha, okay, this is hitting the same aliased resources issue that we have been fighting when targeting Apple platforms via MoltenVK.
SPIR-V spec allows multiple resources bound to the same (set, binding) slot; real GPU drivers are all fine with that AFAICT. However SPIRV-Cross does not support aliased resources too. So I wrote a UnifyAliasedResourcePass
(code and test) to try to do various type casting and re-indexing to eliminate those aliased resources. However as recently I found there is a correctness issue with that pass so I limited it to only Apple platforms for now. I will spend some time later to figure that out and fix it so we can reuse here.
Better, it would be nice if Tint can support such cases. @dneto0 as FYI.
WGSL does not allow aliasing (group,binding). https://gpuweb.github.io/gpuweb/wgsl/#resource-interface
It does allow the same subresource (buffer or texture layer) to be bound to different (group,binding) points. But it's a "dynamic error" if at least one of them is writable. There's no way to synchronize those within the same shader, not even in Vulkan. (You would need a "domain" operation but that only occurs with API-side actions, and there is no way to invoke one inside the shader).
Confirmed, this fixes compiling mobilebertsquad.tflite.mlir
🎉
(there is still an issue with push constant emulation and Uniform storage, but this aliasing issue is fixed)
This might have regressed at some point. With this test program: https://github.com/openxla/iree/blob/4efb89474821cbce40e40e5b5502d027a91688ec/experimental/web/sample_webgpu/multiple_results.mlir#L1-L8
I see these errors at https://github.com/openxla/iree/commit/1a0b3fd28eb72558662e191a24d9767a679c8c16:
// ./experimental/web/sample_webgpu/build_sample.sh
//
// Compiling 'multiple_results' sample for WebGPU...
// Tint reported 1 error(s) for a SPIR-V program, see diagnostics:
// error: entry point 'd0' references multiple variables that use the same resource binding @group(0), @binding(2)
// note: first resource binding usage declared here
// iree/experimental/web/sample_webgpu/multiple_results.mlir:6:15: error: failed to compile SPIR-V to WGSL. Consider inspecting the shader program using -iree-hal-dump-executable-intermediates.
%result_1 = math.absf %input_1 : tensor<f32>
^
// iree/experimental/web/sample_webgpu/multiple_results.mlir:1:1: note: called from
func.func @multiple_results(
^
// iree/experimental/web/sample_webgpu/multiple_results.mlir:6:15: note: see // current operation:
"hal.executable.variant"() ({
"hal.executable.export"() ({
^bb0(%arg0: !hal.device):
%0 = "arith.constant"() <{value = 1 : index}> : () -> index
"hal.return"(%0, %0, %0) : (index, index, index) -> ()
}) {layout = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>]>]>, ordinal = 0 : index, sym_name = "d0", translation_info = #iree_codegen.translation_info<SPIRVBaseLowering>, workgroup_size = [1 : index, 1 : index, 1 : index]} : () -> ()
"builtin.module"() ({
"spirv.module"() <{addressing_model = #spirv.addressing_model<Logical>, memory_model = #spirv.memory_model<GLSL450>, vce_triple = #spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>}> ({
"spirv.GlobalVariable"() <{binding = 0 : i32, descriptor_set = 0 : i32, sym_name = "__resource_var_0_0_", type = !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>}> : () -> ()
"spirv.GlobalVariable"() <{binding = 1 : i32, descriptor_set = 0 : i32, sym_name = "__resource_var_0_1_", type = !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>}> : () -> ()
"spirv.GlobalVariable"() <{binding = 2 : i32, descriptor_set = 0 : i32, sym_name = "__resource_var_0_2__0", type = !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>}> {aliased} : () -> ()
"spirv.GlobalVariable"() <{binding = 2 : i32, descriptor_set = 0 : i32, sym_name = "__resource_var_0_2_", type = !spirv.ptr<!spirv.struct<(!spirv.rtarray<f32, stride=4> [0])>, StorageBuffer>}> {aliased} : () -> ()
"spirv.func"() <{function_control = #spirv.function_control<None>, function_type = () -> (), sym_name = "d0"}> ({
%0 = "spirv.Constant"() <{value = 16 : i32}> : () -> i32
%1 = "spirv.Constant"() <{value = 0 : i32}> : () -> i32
%2 = "spirv.mlir.addressof"() <{variable = @__resource_var_0_0_}> : () -> !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>
%3 = "spirv.mlir.addressof"() <{variable = @__resource_var_0_1_}> : () -> !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>
%4 = "spirv.mlir.addressof"() <{variable = @__resource_var_0_2__0}> : () -> !spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>
%5 = "spirv.mlir.addressof"() <{variable = @__resource_var_0_2_}> : () -> !spirv.ptr<!spirv.struct<(!spirv.rtarray<f32, stride=4> [0])>, StorageBuffer>
%6 = "spirv.AccessChain"(%2, %1, %1) : (!spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f32, StorageBuffer>
%7 = "spirv.Load"(%6) : (!spirv.ptr<f32, StorageBuffer>) -> f32
%8 = "spirv.GL.FAbs"(%7) : (f32) -> f32
%9 = "spirv.AccessChain"(%5, %1, %0) : (!spirv.ptr<!spirv.struct<(!spirv.rtarray<f32, stride=4> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f32, StorageBuffer>
"spirv.Store"(%9, %8) : (!spirv.ptr<f32, StorageBuffer>, f32) -> ()
%10 = "spirv.AccessChain"(%3, %1, %1) : (!spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f32, StorageBuffer>
%11 = "spirv.Load"(%10) : (!spirv.ptr<f32, StorageBuffer>) -> f32
%12 = "spirv.GL.FAbs"(%11) : (f32) -> f32
%13 = "spirv.AccessChain"(%4, %1, %1) : (!spirv.ptr<!spirv.struct<(!spirv.array<1 x f32, stride=4> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f32, StorageBuffer>
"spirv.Store"(%13, %12) : (!spirv.ptr<f32, StorageBuffer>, f32) -> ()
"spirv.Return"() : () -> ()
}) : () -> ()
"spirv.EntryPoint"() <{execution_model = #spirv.execution_model<GLCompute>, fn = @d0, interface = []}> : () -> ()
"spirv.ExecutionMode"() <{execution_mode = #spirv.execution_mode<LocalSize>, fn = @d0, values = [1 : i32, 1 : i32, 1 : i32]}> : () -> ()
}) : () -> ()
}) {spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, api=WebGPU, #spirv.resource_limits<>>} : () -> ()
"hal.executable.variant_end"() : () -> ()
}) {sym_name = "webgpu_wgsl_fb", target = #hal.executable.target<"webgpu", "webgpu-wgsl-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, api=WebGPU, #spirv.resource_limits<>>}>} : () -> ()
%result_1 = math.absf %input_1 : tensor<f32>
I've been wanting to add more test cases for various program structures (multiple inputs, multiple outputs, control flow, etc.)... wish I did that before this regressed :upside_down_face:
From running a bisect, it looks like that test program regressed with https://github.com/openxla/iree/commit/3459833ae9c3f0fefe33ff08df7b99359d0c26b4 (https://github.com/openxla/iree/pull/13711). I'll dig in further... maybe the workaround for this issue doesn't apply for small/scalar computations after that change.
// multiple_results.mlir
func.func @multiple_results(
%input_0 : tensor<f32>,
%input_1 : tensor<f32>
) -> (tensor<f32>, tensor<f32>) {
%result_0 = math.absf %input_0 : tensor<f32>
%result_1 = math.absf %input_1 : tensor<f32>
return %result_0, %result_1 : tensor<f32>, tensor<f32>
}
../iree-build/tools/iree-compile \
multiple_results.mlir \
--iree-hal-target-backends=webgpu \
--iree-codegen-gpu-native-math-precision=true \
--iree-stream-resource-alias-mutable-bindings=true \
-o /dev/null
What happened?
I'm trying to compile
mobilebertsquad.tflite.mlir
from our benchmark suites for WebGPU and see this error:Here are the relevant .mlir and .spvasm files: https://gist.github.com/ScottTodd/6af59563826719340f659dd0554edd40
Steps to reproduce your issue
mobilebertsquad.tflite.mlir
from our benchmark infra. I downloaded files from Oct 28, so maybe around https://buildkite.com/iree/iree-benchmark-android/builds/4621iree-compile mobilebertsquad.tflite.mlir --iree-hal-target-backends=webgpu --iree-input-type=tosa
What component(s) does this issue relate to?
Compiler
Version information
webgpu
branch, https://github.com/iree-org/iree/commit/4116d5aa57452fc9b5f1695f62188cd054af4cf1Additional context
I was able to compile this using an older version of IREE's compiler, so this might be a regression? With the version that I compiled, I saw this runtime error:
Tint WGSL reader failure: :44:211 error: value cannot be represented as 'f32' x__resource_var_0_1_.field0[(x_83 + select(x_131, (4294967295u - x_131), x_128))] = select(select(select(select(select(vec4<f32>(1.17549435e-38f, 1.17549435e-38f, 1.17549435e-38f, 1.17549435e-38f), vec4<f32>(0x1p+128f, 0x1p+128f, 0x1p+128f, 0x1p+128f), (x_99 > vec4<f32>(0.0f, 0.0f, 0.0f, 0.0f))), (fma(fma(vec4<f32>(0.0131435f, 0.0131435f, 0.0131435f, 0.0131435f), x_104, vec4<f32>(0.03668965f, 0.03668965f, 0.03668965f, 0.03668965f)), (x_105 * x_105), fma(fma(vec4<f32>(0.168738902f, 0.168738902f, 0.168738902f, 0.168738902f), x_104, vec4<f32>(0.499705136f, 0.499705136f, 0.499705136f, 0.499705136f)), x_105, fma(vec4<f32>(1.0f, 1.0f, 1.0f, 1.0f), x_104, vec4<f32>(1.0f, 1.0f, 1.0f, 1.0f)))) * bitcast<vec4<f32>>(((x_112 + vec4<u32>(127u, 127u, 127u, 127u)) << vec4<u32>(23u, 23u, 23u, 23u)))), ((bitcast<vec4<i32>>(x_112) <= bitcast<vec4<i32>>(vec4<u32>(127u, 127u, 127u, 127u))) & (bitcast<vec4<i32>>(x_112) >= bitcast<vec4<i32>>(vec4<u32>(4294967169u, 4294967169u, 4294967169u, 4294967169u))))), vec4<f32>(0x1p+128f, 0x1p+128f, 0x1p+128f, 0x1p+128f), (x_99 == vec4<f32>(0x1p+128f, 0x1p+128f, 0x1p+128f, 0x1p+128f))), vec4<f32>(0.0f, 0.0f, 0.0f, 0.0f), (x_99 == vec4<f32>(-0x1p+128f, -0x1p+128f, -0x1p+128f, -0x1p+128f))), x_99, vec4<bool>(false, false, false, false));