iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.5k stars 557 forks source link

(Vulkan/SPIR-V) torch.aten.convolution lowering error on SD2.1 VAE #16381

Open monorimet opened 5 months ago

monorimet commented 5 months ago

What happened?

Running iree-compile on VAE torch IR for SPIR-V backend results in the following error:

<unknown>:0: error: failed to legalize operation 'arith.constant'
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:277:10: error: Failures have been detected while processing an MLIR pass pipeline
    %7 = torch.aten.convolution %0, %1, %2, %3, %4, %5, %false, %6, %int1_6 : !torch.vtensor<[1,4,64,64],f16>, !torch.vtensor<[4,4,1,1],f16>, !torch.vtensor<[4],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,4,64,64],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:277:10: note: Pipeline failed while executing [`TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_1, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_11, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_19, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVInitialVectorLowering` on 'func.func' operation: @main_dispatch_11_conv_2d_nchw_fchw_1x512x64x64x512x3x3_f16, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_24, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_25, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_26, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_27, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_28, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_29, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_30, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_31, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_32, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVInitialVectorLowering` on 'func.func' operation: @main_dispatch_29_batch_matmul_1x4096x4096x512_f16, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVInitialVectorLowering` on 'func.func' operation: @main_dispatch_19_conv_2d_nchw_fchw_1x512x64x64x512x3x3_f16, `ConvertToSPIRV` on 'builtin.module' operation, `GenericVectorization` on 'func.func' operation: @main_dispatch_31_batch_matmul_1x4096x512x4096_f16, `OptimizeTensorInsertExtractSlices` on 'func.func' operation: @main_dispatch_30_softmax_4096x4096xf16, `SPIRVTile` on 'func.func' operation: @main_dispatch_32_generic_512x512_f16xf32, `Canonicalizer` on 'builtin.module' operation, `PolynomialApproximationPass` on 'func.func' operation: @main_dispatch_25_matmul_transpose_b_4096x512x512_f16, `PolynomialApproximationPass` on 'func.func' operation: @main_dispatch_24_generic_32x16x4096_f16xf32xf32xf16xf16xf16, `OptimizeVectorTransfer` on 'func.func' operation: @main_dispatch_27_matmul_transpose_b_4096x512x512_f16]: reproducer generated at `./shark_tmp/core-reproducer.mlir`
    %7 = torch.aten.convolution %0, %1, %2, %3, %4, %5, %false, %6, %int1_6 : !torch.vtensor<[1,4,64,64],f16>, !torch.vtensor<[4,4,1,1],f16>, !torch.vtensor<[4],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,4,64,64],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:277:10: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %7 = torch.aten.convolution %0, %1, %2, %3, %4, %5, %false, %6, %int1_6 : !torch.vtensor<[1,4,64,64],f16>, !torch.vtensor<[4,4,1,1],f16>, !torch.vtensor<[4],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,4,64,64],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:277:10: error: failed to translate executables
    %7 = torch.aten.convolution %0, %1, %2, %3, %4, %5, %false, %6, %int1_6 : !torch.vtensor<[1,4,64,64],f16>, !torch.vtensor<[4,4,1,1],f16>, !torch.vtensor<[4],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,4,64,64],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:382:11: error: failed to legalize operation 'vector.bitcast'
    %55 = torch.aten.convolution %48, %49, %50, %51, %52, %53, %false_42, %54, %int1_45 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:382:11: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %55 = torch.aten.convolution %48, %49, %50, %51, %52, %53, %false_42, %54, %int1_45 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:382:11: error: failed to translate executables
    %55 = torch.aten.convolution %48, %49, %50, %51, %52, %53, %false_42, %54, %int1_45 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:470:11: error: failed to legalize operation 'vector.bitcast'
    %97 = torch.aten.convolution %90, %91, %92, %93, %94, %95, %false_84, %96, %int1_87 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:472:11: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %98 = torch.aten.add.Tensor %14, %97, %int1_88 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[1,512,64,64],f16>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:472:11: error: failed to translate executables
    %98 = torch.aten.add.Tensor %14, %97, %int1_88 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[1,512,64,64],f16>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:252:10: error: failed to legalize operation 'vector.bitcast'
    %1 = call @decode_inp(%0) : (!torch.vtensor<[1,4,64,64],f16>) -> !torch.vtensor<[1,3,512,512],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:665:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %183 = torch.aten.bmm %178, %182 : !torch.vtensor<[1,4096,512],f16>, !torch.vtensor<[1,512,4096],f16> -> !torch.vtensor<[1,4096,4096],f16>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:665:12: error: failed to translate executables
    %183 = torch.aten.bmm %178, %182 : !torch.vtensor<[1,4096,512],f16>, !torch.vtensor<[1,512,4096],f16> -> !torch.vtensor<[1,4096,4096],f16>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
<unknown>:0: error: failed to legalize operation 'arith.constant'
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:674:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %186 = torch.aten._softmax %185, %int-1_191, %false_192 : !torch.vtensor<[1,1,4096,4096],f16>, !torch.int, !torch.bool -> !torch.vtensor<[1,1,4096,4096],f16>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:674:12: error: failed to translate executables
    %186 = torch.aten._softmax %185, %int-1_191, %false_192 : !torch.vtensor<[1,1,4096,4096],f16>, !torch.int, !torch.bool -> !torch.vtensor<[1,1,4096,4096],f16>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:252:10: error: failed to legalize operation 'vector.bitcast'
    %1 = call @decode_inp(%0) : (!torch.vtensor<[1,4,64,64],f16>) -> !torch.vtensor<[1,3,512,512],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:735:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %210 = torch.prims.convert_element_type %205, %int6_227 : !torch.vtensor<[4096,512],f16>, !torch.int -> !torch.vtensor<[4096,512],f32>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:735:12: error: failed to translate executables
    %210 = torch.prims.convert_element_type %205, %int6_227 : !torch.vtensor<[4096,512],f16>, !torch.int -> !torch.vtensor<[4096,512],f32>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {

This is believed to be an unhandled large vector case in convolution lowerings, but I don't have enough context here to corroborate.

Steps to reproduce your issue

  1. Download vae_decode mlir with stripped weights
  2. Have iree-compile ~20240212.799
  3. run:
    iree-compile.exe ./vae_decode.torch.tempfile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=./core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-max-allocation-size=3221225472 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true --verify=false --iree-flow-collapse-reduction-dims --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 -o vae.vmfb
  4. See error

What component(s) does this issue relate to?

No response

Version information

20240212.799

Additional context

This was functional in mid-december, sure version to try would be from 2023/12/12 -- can bisect further if needed.

antiagainst commented 5 months ago

--iree-vulkan-target-triple is missing so this goes to compile against the most restrictive swiftshader target. don't think we would prioritize fixing that target for now. could you add rdna targets to try out?

monorimet commented 5 months ago

Ran into the same issue with the following two CLI inputs:

with SRT, all flags normally used for this config in SHARK:

iree-compile.exe C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=./shark_tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-max-allocation-size=3221225472 --iree-vulkan-target-env='#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_KHR_cooperative_matrix], AMD:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 65536, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 64, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 64, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, cooperativeMatrixPropertiesKHR = [#vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, accSat = false, scope = #vk.scope<Subgroup>>], shaderIntegerDotProduct = unit >>' --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true --verify=false --iree-flow-collapse-reduction-dims --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32},iree-linalg-ext-convert-conv2d-to-winograd))' --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-preprocessing-pad-linalg-ops{pad-size=16}))' --iree-vulkan-target-triple=rdna3-7900-windows-msvc

minimized, with target triple flag:

iree-compile.exe C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile --iree-input-type=torch --iree-vulkan-target-triple=rdna3-7900-windows --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan -o vae.vmfb
monorimet commented 5 months ago

I managed to prevent the failure on torch.aten.convolution by cleaning up some preprocessing flags:

--iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32},iree-linalg-ext-convert-conv2d-to-winograd))'

but I'm still seeing issues with:

Diagnostics:
<unknown>:0: error: failed to legalize operation 'arith.constant'
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2874:13: error: Failures have been detected while processing an MLIR pass pipeline
    %1210 = torch.prims.convert_element_type %1209, %int6_1284 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>
            ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2874:13: note: Pipeline failed while executing [`TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_220, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_229, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_237, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_245, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_253, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_261, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `FlattenMemRefSubspan` on 'builtin.module' operation, `FlattenMemRefSubspan` on 'builtin.module' operation, `FlattenMemRefSubspan` on 'builtin.module' operation, `FlattenMemRefSubspan` on 'builtin.module' operation, `ConvertToSPIRV` on 'builtin.module' operation, `ConvertToSPIRV` on 'builtin.module' operation]: reproducer generated at `./shark_tmp/core-reproducer.mlir`
    %1210 = torch.prims.convert_element_type %1209, %int6_1284 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>
            ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2874:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>]>>}>
    %1210 = torch.prims.convert_element_type %1209, %int6_1284 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>
            ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2874:13: error: failed to translate executables
    %1210 = torch.prims.convert_element_type %1209, %int6_1284 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>
            ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
<unknown>:0: error: failed to legalize operation 'arith.constant'
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2987:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>]>>}>
    %1262 = torch.prims.convert_element_type %1261, %int6_1340 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>
            ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2987:13: error: failed to translate executables
    %1262 = torch.prims.convert_element_type %1261, %int6_1340 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>
            ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
<unknown>:0: error: failed to legalize operation 'arith.constant'
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:3075:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>]>>}>
    %1304 = torch.prims.convert_element_type %1303, %int6_1383 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>
            ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:3075:13: error: failed to translate executables
    %1304 = torch.prims.convert_element_type %1303, %int6_1383 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>
monorimet commented 5 months ago

Narrowed down to the attached dispatch, can reproduce with:

iree-compile.exe dispatch_189.mlir --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=./shark_tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-max-allocation-size=3221225472 --iree-vulkan-target-env='#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_KHR_cooperative_matrix], AMD:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 65536, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 64, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 64, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, cooperativeMatrixPropertiesKHR = [#vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, accSat = false, scope = #vk.scope<Subgroup>>], shaderIntegerDotProduct = unit >>' --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true --verify=false --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=16}))' --iree-vulkan-target-triple=rdna3-7900-windows-msvc
hal.executable public @main_dispatch_189 {
  hal.executable.variant public @vulkan_spirv_fb target(<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>]>>}>) {
    hal.executable.export public @main_dispatch_189_matmul_256x262144x2304_f16 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>, <3, storage_buffer>]>]>) {
    ^bb0(%arg0: !hal.device):
      %x, %y, %z = flow.dispatch.workgroup_count_from_slice 
      hal.return %x, %y, %z : index, index, index
    }
    builtin.module {
      func.func @main_dispatch_189_matmul_256x262144x2304_f16() {
        %cst = arith.constant 0.000000e+00 : f16
        %c95657600 = arith.constant 95657600 : index
        %c96837248 = arith.constant 96837248 : index
        %c403703808 = arith.constant 403703808 : index
        %c0 = arith.constant 0 : index
        %c134217728 = arith.constant 134217728 : index
        %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c95657600) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x2304xf16>>
        %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c403703808) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2304x262144xf16>>
        %2 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c96837248) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256xf16>>
        %3 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x262144xf16>>
        %4 = hal.interface.binding.subspan set(0) binding(3) type(storage_buffer) alignment(64) offset(%c134217728) : !flow.dispatch.tensor<writeonly:tensor<256x262144xf32>>
        %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 2304], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x2304xf16>> -> tensor<256x2304xf16>
        %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2304, 262144], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2304x262144xf16>> -> tensor<2304x262144xf16>
        %7 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [256], strides = [1] : !flow.dispatch.tensor<readonly:tensor<256xf16>> -> tensor<256xf16>
        %8 = tensor.empty() : tensor<256x262144xf32>
        %9 = tensor.empty() : tensor<256x262144xf16>
        %10 = linalg.fill ins(%cst : f16) outs(%9 : tensor<256x262144xf16>) -> tensor<256x262144xf16>
        %11 = linalg.matmul ins(%5, %6 : tensor<256x2304xf16>, tensor<2304x262144xf16>) outs(%10 : tensor<256x262144xf16>) -> tensor<256x262144xf16>
        %12:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%11, %7 : tensor<256x262144xf16>, tensor<256xf16>) outs(%9, %8 : tensor<256x262144xf16>, tensor<256x262144xf32>) {
        ^bb0(%in: f16, %in_0: f16, %out: f16, %out_1: f32):
          %13 = arith.addf %in, %in_0 : f16
          %14 = arith.extf %13 : f16 to f32
          linalg.yield %13, %14 : f16, f32
        } -> (tensor<256x262144xf16>, tensor<256x262144xf32>)
        flow.dispatch.tensor.store %12#0, %3, offsets = [0, 0], sizes = [256, 262144], strides = [1, 1] : tensor<256x262144xf16> -> !flow.dispatch.tensor<writeonly:tensor<256x262144xf16>>
        flow.dispatch.tensor.store %12#1, %4, offsets = [0, 0], sizes = [256, 262144], strides = [1, 1] : tensor<256x262144xf32> -> !flow.dispatch.tensor<writeonly:tensor<256x262144xf32>>
        return
      }
    }
  }
  hal.executable.variant public @vulkan_spirv_fb_0 target(<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, PhysicalStorageBufferAddresses, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_integer_dot_product, SPV_KHR_storage_buffer_storage_class, SPV_KHR_physical_storage_buffer, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = i8, b_type = i8, c_type = i32, result_type = i32, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>>]>>}>) {
    hal.executable.export public @main_dispatch_189_matmul_256x262144x2304_f16 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>, <3, storage_buffer>]>]>) {
    ^bb0(%arg0: !hal.device):
      %x, %y, %z = flow.dispatch.workgroup_count_from_slice 
      hal.return %x, %y, %z : index, index, index
    }
    builtin.module {
      func.func @main_dispatch_189_matmul_256x262144x2304_f16() {
        %cst = arith.constant 0.000000e+00 : f16
        %c95657600 = arith.constant 95657600 : index
        %c96837248 = arith.constant 96837248 : index
        %c403703808 = arith.constant 403703808 : index
        %c0 = arith.constant 0 : index
        %c134217728 = arith.constant 134217728 : index
        %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c95657600) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x2304xf16>>
        %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c403703808) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2304x262144xf16>>
        %2 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c96837248) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256xf16>>
        %3 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x262144xf16>>
        %4 = hal.interface.binding.subspan set(0) binding(3) type(storage_buffer) alignment(64) offset(%c134217728) : !flow.dispatch.tensor<writeonly:tensor<256x262144xf32>>
        %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 2304], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x2304xf16>> -> tensor<256x2304xf16>
        %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2304, 262144], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2304x262144xf16>> -> tensor<2304x262144xf16>
        %7 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [256], strides = [1] : !flow.dispatch.tensor<readonly:tensor<256xf16>> -> tensor<256xf16>
        %8 = tensor.empty() : tensor<256x262144xf32>
        %9 = tensor.empty() : tensor<256x262144xf16>
        %10 = linalg.fill ins(%cst : f16) outs(%9 : tensor<256x262144xf16>) -> tensor<256x262144xf16>
        %11 = linalg.matmul ins(%5, %6 : tensor<256x2304xf16>, tensor<2304x262144xf16>) outs(%10 : tensor<256x262144xf16>) -> tensor<256x262144xf16>
        %12:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%11, %7 : tensor<256x262144xf16>, tensor<256xf16>) outs(%9, %8 : tensor<256x262144xf16>, tensor<256x262144xf32>) {
        ^bb0(%in: f16, %in_0: f16, %out: f16, %out_1: f32):
          %13 = arith.addf %in, %in_0 : f16
          %14 = arith.extf %13 : f16 to f32
          linalg.yield %13, %14 : f16, f32
        } -> (tensor<256x262144xf16>, tensor<256x262144xf32>)
        flow.dispatch.tensor.store %12#0, %3, offsets = [0, 0], sizes = [256, 262144], strides = [1, 1] : tensor<256x262144xf16> -> !flow.dispatch.tensor<writeonly:tensor<256x262144xf16>>
        flow.dispatch.tensor.store %12#1, %4, offsets = [0, 0], sizes = [256, 262144], strides = [1, 1] : tensor<256x262144xf32> -> !flow.dispatch.tensor<writeonly:tensor<256x262144xf32>>
        return
      }
    }
  }
}
monorimet commented 5 months ago

Update: Been playing around with this for a bit. I think I was fumbling into other issues above.

This is a more simple approach with less CLI stuff happening:

To get dispatches from VAE e2e mlir:

iree-compile.exe C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile --iree-vulkan-target-triple=rdna3-7900-windows --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --iree-util-zero-fill-elided-attrs --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32}))' --compile-to=hal --iree-hal-dump-executable-configurations-to=./vae_dispatches --mlir-print-debuginfo=false --mlir-print-op-on-diagnostic=false 2> out_e2e.txt

Result:
out_e2e.txt

To attempt to compile one of the dispatches shown in the error from above command:

iree-compile.exe C:\V\SHARK\vae_dispatches\configured_compiled_vae_main_dispatch_196.mlir --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 --iree-vulkan-target-triple=rdna3-7900-windows-msvc --compile-from=executable-configurations 2> out.txt

out.txt

antiagainst commented 5 months ago

We have a arith.constant dense_resource<torch_tensor_4_torch.float16> : tensor<4xf16> op in the IR before converting to SPIR-V. Support for that is not yet implemented in converting to SPIR-V.

@MaheshRavishankar I haven't followed the development on dense resource per se--is it expected to appear in the kernel? If so the resource it references should also appear in the dispatch (currently missing)?