(Vulkan/SPIR-V) torch.aten.convolution lowering error on SD2.1 VAE

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Apache License 2.0

2.5k stars 557 forks source link

What happened?

Running iree-compile on VAE torch IR for SPIR-V backend results in the following error:

<unknown>:0: error: failed to legalize operation 'arith.constant'
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:277:10: error: Failures have been detected while processing an MLIR pass pipeline
    %7 = torch.aten.convolution %0, %1, %2, %3, %4, %5, %false, %6, %int1_6 : !torch.vtensor<[1,4,64,64],f16>, !torch.vtensor<[4,4,1,1],f16>, !torch.vtensor<[4],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,4,64,64],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:277:10: note: Pipeline failed while executing [`TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_1, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_11, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_19, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVInitialVectorLowering` on 'func.func' operation: @main_dispatch_11_conv_2d_nchw_fchw_1x512x64x64x512x3x3_f16, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_24, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_25, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_26, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_27, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_28, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_29, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_30, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_31, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_32, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVInitialVectorLowering` on 'func.func' operation: @main_dispatch_29_batch_matmul_1x4096x4096x512_f16, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVInitialVectorLowering` on 'func.func' operation: @main_dispatch_19_conv_2d_nchw_fchw_1x512x64x64x512x3x3_f16, `ConvertToSPIRV` on 'builtin.module' operation, `GenericVectorization` on 'func.func' operation: @main_dispatch_31_batch_matmul_1x4096x512x4096_f16, `OptimizeTensorInsertExtractSlices` on 'func.func' operation: @main_dispatch_30_softmax_4096x4096xf16, `SPIRVTile` on 'func.func' operation: @main_dispatch_32_generic_512x512_f16xf32, `Canonicalizer` on 'builtin.module' operation, `PolynomialApproximationPass` on 'func.func' operation: @main_dispatch_25_matmul_transpose_b_4096x512x512_f16, `PolynomialApproximationPass` on 'func.func' operation: @main_dispatch_24_generic_32x16x4096_f16xf32xf32xf16xf16xf16, `OptimizeVectorTransfer` on 'func.func' operation: @main_dispatch_27_matmul_transpose_b_4096x512x512_f16]: reproducer generated at `./shark_tmp/core-reproducer.mlir`
    %7 = torch.aten.convolution %0, %1, %2, %3, %4, %5, %false, %6, %int1_6 : !torch.vtensor<[1,4,64,64],f16>, !torch.vtensor<[4,4,1,1],f16>, !torch.vtensor<[4],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,4,64,64],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:277:10: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %7 = torch.aten.convolution %0, %1, %2, %3, %4, %5, %false, %6, %int1_6 : !torch.vtensor<[1,4,64,64],f16>, !torch.vtensor<[4,4,1,1],f16>, !torch.vtensor<[4],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,4,64,64],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:277:10: error: failed to translate executables
    %7 = torch.aten.convolution %0, %1, %2, %3, %4, %5, %false, %6, %int1_6 : !torch.vtensor<[1,4,64,64],f16>, !torch.vtensor<[4,4,1,1],f16>, !torch.vtensor<[4],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,4,64,64],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:382:11: error: failed to legalize operation 'vector.bitcast'
    %55 = torch.aten.convolution %48, %49, %50, %51, %52, %53, %false_42, %54, %int1_45 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:382:11: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %55 = torch.aten.convolution %48, %49, %50, %51, %52, %53, %false_42, %54, %int1_45 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:382:11: error: failed to translate executables
    %55 = torch.aten.convolution %48, %49, %50, %51, %52, %53, %false_42, %54, %int1_45 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:470:11: error: failed to legalize operation 'vector.bitcast'
    %97 = torch.aten.convolution %90, %91, %92, %93, %94, %95, %false_84, %96, %int1_87 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:472:11: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %98 = torch.aten.add.Tensor %14, %97, %int1_88 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[1,512,64,64],f16>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:472:11: error: failed to translate executables
    %98 = torch.aten.add.Tensor %14, %97, %int1_88 : !torch.vtensor<[1,512,64,64],f16>, !torch.vtensor<[1,512,64,64],f16>, !torch.int -> !torch.vtensor<[1,512,64,64],f16>
          ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:252:10: error: failed to legalize operation 'vector.bitcast'
    %1 = call @decode_inp(%0) : (!torch.vtensor<[1,4,64,64],f16>) -> !torch.vtensor<[1,3,512,512],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:665:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %183 = torch.aten.bmm %178, %182 : !torch.vtensor<[1,4096,512],f16>, !torch.vtensor<[1,512,4096],f16> -> !torch.vtensor<[1,4096,4096],f16>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:665:12: error: failed to translate executables
    %183 = torch.aten.bmm %178, %182 : !torch.vtensor<[1,4096,512],f16>, !torch.vtensor<[1,512,4096],f16> -> !torch.vtensor<[1,4096,4096],f16>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
<unknown>:0: error: failed to legalize operation 'arith.constant'
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:674:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %186 = torch.aten._softmax %185, %int-1_191, %false_192 : !torch.vtensor<[1,1,4096,4096],f16>, !torch.int, !torch.bool -> !torch.vtensor<[1,1,4096,4096],f16>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:674:12: error: failed to translate executables
    %186 = torch.aten._softmax %185, %int-1_191, %false_192 : !torch.vtensor<[1,1,4096,4096],f16>, !torch.int, !torch.bool -> !torch.vtensor<[1,1,4096,4096],f16>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:252:10: error: failed to legalize operation 'vector.bitcast'
    %1 = call @decode_inp(%0) : (!torch.vtensor<[1,4,64,64],f16>) -> !torch.vtensor<[1,3,512,512],f16>
         ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:735:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_khr = []>>}>
    %210 = torch.prims.convert_element_type %205, %int6_227 : !torch.vtensor<[4096,512],f16>, !torch.int -> !torch.vtensor<[4096,512],f32>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {
  ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:735:12: error: failed to translate executables
    %210 = torch.prims.convert_element_type %205, %int6_227 : !torch.vtensor<[4096,512],f16>, !torch.int -> !torch.vtensor<[4096,512],f32>
           ^
C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile:250:3: note: called from
  func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} {

This is believed to be an unhandled large vector case in convolution lowerings, but I don't have enough context here to corroborate.

Steps to reproduce your issue

Download vae_decode mlir with stripped weights
Have iree-compile ~20240212.799

run:

iree-compile.exe ./vae_decode.torch.tempfile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=./core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-max-allocation-size=3221225472 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true --verify=false --iree-flow-collapse-reduction-dims --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 -o vae.vmfb

See error

What component(s) does this issue relate to?

No response

Version information

20240212.799

Additional context

This was functional in mid-december, sure version to try would be from 2023/12/12 -- can bisect further if needed.

iree-compile.exe C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=./shark_tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-max-allocation-size=3221225472 --iree-vulkan-target-env='#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_KHR_cooperative_matrix], AMD:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 65536, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 64, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 64, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, cooperativeMatrixPropertiesKHR = [#vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, accSat = false, scope = #vk.scope<Subgroup>>], shaderIntegerDotProduct = unit >>' --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true --verify=false --iree-flow-collapse-reduction-dims --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32},iree-linalg-ext-convert-conv2d-to-winograd))' --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-preprocessing-pad-linalg-ops{pad-size=16}))' --iree-vulkan-target-triple=rdna3-7900-windows-msvc

iree-compile.exe C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_decode.torch.tempfile --iree-input-type=torch --iree-vulkan-target-triple=rdna3-7900-windows --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan -o vae.vmfb

--iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32},iree-linalg-ext-convert-conv2d-to-winograd))'

Diagnostics: <unknown>:0: error: failed to legalize operation 'arith.constant' C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2874:13: error: Failures have been detected while processing an MLIR pass pipeline %1210 = torch.prims.convert_element_type %1209, %int6_1284 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32> ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} { ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2874:13: note: Pipeline failed while executing [`TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_220, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_229, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_237, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_245, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_253, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `TranslateExecutablesPass` on 'hal.executable' operation: @main_dispatch_261, `TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `FlattenMemRefSubspan` on 'builtin.module' operation, `FlattenMemRefSubspan` on 'builtin.module' operation, `FlattenMemRefSubspan` on 'builtin.module' operation, `FlattenMemRefSubspan` on 'builtin.module' operation, `ConvertToSPIRV` on 'builtin.module' operation, `ConvertToSPIRV` on 'builtin.module' operation]: reproducer generated at `./shark_tmp/core-reproducer.mlir` %1210 = torch.prims.convert_element_type %1209, %int6_1284 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32> ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2874:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>]>>}> %1210 = torch.prims.convert_element_type %1209, %int6_1284 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32> ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} { ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2874:13: error: failed to translate executables %1210 = torch.prims.convert_element_type %1209, %int6_1284 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32> ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} { ^ <unknown>:0: error: failed to legalize operation 'arith.constant' C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2987:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>]>>}> %1262 = torch.prims.convert_element_type %1261, %int6_1340 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32> ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} { ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:2987:13: error: failed to translate executables %1262 = torch.prims.convert_element_type %1261, %int6_1340 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32> ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} { ^ <unknown>:0: error: failed to legalize operation 'arith.constant' C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:3075:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>]>>}> %1304 = torch.prims.convert_element_type %1303, %int6_1383 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32> ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:250:3: note: called from func.func @main(%arg0: tensor<1x4x64x64xf16>) -> tensor<1x3x512x512xf16> attributes {torch.args_schema = "[1, {\22type\22: \22builtins.tuple\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: \22builtins.list\22, \22context\22: \22null\22, \22children_spec\22: [{\22type\22: null, \22context\22: null, \22children_spec\22: []}]}, {\22type\22: \22builtins.dict\22, \22context\22: \22[]\22, \22children_spec\22: []}]}]", torch.return_schema = "[1, {\22type\22: null, \22context\22: null, \22children_spec\22: []}]"} { ^ C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile:3075:13: error: failed to translate executables %1304 = torch.prims.convert_element_type %1303, %int6_1383 : !torch.vtensor<[1,32,4,262144],f16>, !torch.int -> !torch.vtensor<[1,32,4,262144],f32>

iree-compile.exe dispatch_189.mlir --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=./shark_tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-max-allocation-size=3221225472 --iree-vulkan-target-env='#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_KHR_cooperative_matrix], AMD:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 65536, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 64, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 64, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, cooperativeMatrixPropertiesKHR = [#vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, accSat = false, scope = #vk.scope<Subgroup>>], shaderIntegerDotProduct = unit >>' --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true --verify=false --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=16}))' --iree-vulkan-target-triple=rdna3-7900-windows-msvc

hal.executable public @main_dispatch_189 { hal.executable.variant public @vulkan_spirv_fb target(<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>]>>}>) { hal.executable.export public @main_dispatch_189_matmul_256x262144x2304_f16 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>, <3, storage_buffer>]>]>) { ^bb0(%arg0: !hal.device): %x, %y, %z = flow.dispatch.workgroup_count_from_slice hal.return %x, %y, %z : index, index, index } builtin.module { func.func @main_dispatch_189_matmul_256x262144x2304_f16() { %cst = arith.constant 0.000000e+00 : f16 %c95657600 = arith.constant 95657600 : index %c96837248 = arith.constant 96837248 : index %c403703808 = arith.constant 403703808 : index %c0 = arith.constant 0 : index %c134217728 = arith.constant 134217728 : index %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c95657600) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x2304xf16>> %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c403703808) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2304x262144xf16>> %2 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c96837248) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256xf16>> %3 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x262144xf16>> %4 = hal.interface.binding.subspan set(0) binding(3) type(storage_buffer) alignment(64) offset(%c134217728) : !flow.dispatch.tensor<writeonly:tensor<256x262144xf32>> %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 2304], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x2304xf16>> -> tensor<256x2304xf16> %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2304, 262144], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2304x262144xf16>> -> tensor<2304x262144xf16> %7 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [256], strides = [1] : !flow.dispatch.tensor<readonly:tensor<256xf16>> -> tensor<256xf16> %8 = tensor.empty() : tensor<256x262144xf32> %9 = tensor.empty() : tensor<256x262144xf16> %10 = linalg.fill ins(%cst : f16) outs(%9 : tensor<256x262144xf16>) -> tensor<256x262144xf16> %11 = linalg.matmul ins(%5, %6 : tensor<256x2304xf16>, tensor<2304x262144xf16>) outs(%10 : tensor<256x262144xf16>) -> tensor<256x262144xf16> %12:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%11, %7 : tensor<256x262144xf16>, tensor<256xf16>) outs(%9, %8 : tensor<256x262144xf16>, tensor<256x262144xf32>) { ^bb0(%in: f16, %in_0: f16, %out: f16, %out_1: f32): %13 = arith.addf %in, %in_0 : f16 %14 = arith.extf %13 : f16 to f32 linalg.yield %13, %14 : f16, f32 } -> (tensor<256x262144xf16>, tensor<256x262144xf32>) flow.dispatch.tensor.store %12#0, %3, offsets = [0, 0], sizes = [256, 262144], strides = [1, 1] : tensor<256x262144xf16> -> !flow.dispatch.tensor<writeonly:tensor<256x262144xf16>> flow.dispatch.tensor.store %12#1, %4, offsets = [0, 0], sizes = [256, 262144], strides = [1, 1] : tensor<256x262144xf32> -> !flow.dispatch.tensor<writeonly:tensor<256x262144xf32>> return } } } hal.executable.variant public @vulkan_spirv_fb_0 target(<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, PhysicalStorageBufferAddresses, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_integer_dot_product, SPV_KHR_storage_buffer_storage_class, SPV_KHR_physical_storage_buffer, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = i8, b_type = i8, c_type = i32, result_type = i32, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>>]>>}>) { hal.executable.export public @main_dispatch_189_matmul_256x262144x2304_f16 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>, <3, storage_buffer>]>]>) { ^bb0(%arg0: !hal.device): %x, %y, %z = flow.dispatch.workgroup_count_from_slice hal.return %x, %y, %z : index, index, index } builtin.module { func.func @main_dispatch_189_matmul_256x262144x2304_f16() { %cst = arith.constant 0.000000e+00 : f16 %c95657600 = arith.constant 95657600 : index %c96837248 = arith.constant 96837248 : index %c403703808 = arith.constant 403703808 : index %c0 = arith.constant 0 : index %c134217728 = arith.constant 134217728 : index %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c95657600) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x2304xf16>> %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c403703808) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2304x262144xf16>> %2 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c96837248) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256xf16>> %3 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x262144xf16>> %4 = hal.interface.binding.subspan set(0) binding(3) type(storage_buffer) alignment(64) offset(%c134217728) : !flow.dispatch.tensor<writeonly:tensor<256x262144xf32>> %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 2304], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x2304xf16>> -> tensor<256x2304xf16> %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2304, 262144], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2304x262144xf16>> -> tensor<2304x262144xf16> %7 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [256], strides = [1] : !flow.dispatch.tensor<readonly:tensor<256xf16>> -> tensor<256xf16> %8 = tensor.empty() : tensor<256x262144xf32> %9 = tensor.empty() : tensor<256x262144xf16> %10 = linalg.fill ins(%cst : f16) outs(%9 : tensor<256x262144xf16>) -> tensor<256x262144xf16> %11 = linalg.matmul ins(%5, %6 : tensor<256x2304xf16>, tensor<2304x262144xf16>) outs(%10 : tensor<256x262144xf16>) -> tensor<256x262144xf16> %12:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%11, %7 : tensor<256x262144xf16>, tensor<256xf16>) outs(%9, %8 : tensor<256x262144xf16>, tensor<256x262144xf32>) { ^bb0(%in: f16, %in_0: f16, %out: f16, %out_1: f32): %13 = arith.addf %in, %in_0 : f16 %14 = arith.extf %13 : f16 to f32 linalg.yield %13, %14 : f16, f32 } -> (tensor<256x262144xf16>, tensor<256x262144xf32>) flow.dispatch.tensor.store %12#0, %3, offsets = [0, 0], sizes = [256, 262144], strides = [1, 1] : tensor<256x262144xf16> -> !flow.dispatch.tensor<writeonly:tensor<256x262144xf16>> flow.dispatch.tensor.store %12#1, %4, offsets = [0, 0], sizes = [256, 262144], strides = [1, 1] : tensor<256x262144xf32> -> !flow.dispatch.tensor<writeonly:tensor<256x262144xf32>> return } } } }

iree-compile.exe C:\V\SHARK\apps\shark_studio\web\shark_tmp\vae_encode.torch.tempfile --iree-vulkan-target-triple=rdna3-7900-windows --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --iree-util-zero-fill-elided-attrs --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32}))' --compile-to=hal --iree-hal-dump-executable-configurations-to=./vae_dispatches --mlir-print-debuginfo=false --mlir-print-op-on-diagnostic=false 2> out_e2e.txt

iree-compile.exe C:\V\SHARK\vae_dispatches\configured_compiled_vae_main_dispatch_196.mlir --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 --iree-vulkan-target-triple=rdna3-7900-windows-msvc --compile-from=executable-configurations 2> out.txt

iree-org / iree