unet vmfb compilation issues with FP16 vulkan in SHARK SD pipeline

When compiling unet mlir to vmfb in the stable diffusion pipeline configured with fp16 using vulkan, the SHARK-generated iree-compile command fails due to improper vulkan environment setup (use of --iree-vulkan-target-env.

Produced by SD pipeline:

iree-compile - \
      --iree-input-type=tm_tensor \
      --iree-vm-bytecode-module-output-format=flatbuffer-binary \
      --iree-hal-target-backends=vulkan \
      --iree-llvmcpu-embedded-linker-path=/home/nod/Documents/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld \
      --mlir-print-debuginfo \
      --mlir-print-op-on-diagnostic=false \
      --iree-llvmcpu-target-cpu-features=host \
      --iree-vulkan-target-env="#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_NV_cooperative_matrix], NVIDIA:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 49152, maxComputeWorkGroupInvocations = 1536, maxComputeWorkGroupSize = dense<[1536, 1024, 64]>: vector<3xi32>, subgroupSize = 32, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 32, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, shaderIntegerDotProduct = unit >>" \
      --iree-stream-resource-index-bits=64 \
      --iree-stream-resource-index-bits=64 \
      --iree-spirv-index-bits=64 \
      --iree-vm-target-index-bits=64 \
      --iree-vm-bytecode-module-strip-source-map=true \
      --iree-util-zero-fill-elided-attrs \

Compiling using vulkan with FP16 results in either arith.const errors or spirv.op / memref.load errors (as shown below) in all dispatches regardless of how the spirv index bits are configured.

<eval_with_key>.3:28:12: error: 'spirv.Store' op mismatch in result type and pointer type
<eval_with_key>.3:28:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_NV_cooperative_matrix]>, api=Vulkan, NVIDIA:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 49152, max_compute_workgroup_invocations = 1536, max_compute_workgroup_size = [1536, 1024, 64], min_subgroup_size = 32, max_subgroup_size = 32, cooperative_matrix_properties_nv = []>>}>
<eval_with_key>.3:28:12: error: failed to serialize executables

Forgoing the SHARK-configured iree-vulkan-target-env altogether and simply specifying the vulkan target triple solves these issues and allows the vmfbs to be compiled successfully. This is a functional workaround, but the SD pipeline is consistently failing to compile unet for FP16 vulkan on its own because of the vk environment setup.

Running into similar problem here. Running SHARK on Radeon 7900XTX (RDNA3) with AMDVLK driver 2023.Q2.2 (LLPC), on NixOS unstable (fhs through steam-run wrapper).

Dump of compilation command(Invoked IREE Tools):

iree-compile - \
    --iree-input-type=tm_tensor \
    --iree-vm-bytecode-module-output-format=flatbuffer-binary \
    --iree-hal-target-backends=vulkan \
    --mlir-print-debuginfo \
    --mlir-print-op-on-diagnostic=false \
    --iree-llvmcpu-target-cpu-features=host \
    '--iree-vulkan-target-env=#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_NV_cooperative_matrix], AMD:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 65536, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 64, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 64, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, cooperativeMatrixPropertiesNV = [#vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, scope = #vk.scope<Subgroup>>], shaderIntegerDotProduct = unit >>' \
    --iree-stream-resource-index-bits=64 \
    --iree-vm-target-index-bits=64 \
    --iree-vm-bytecode-module-strip-source-map=true \
    --iree-util-zero-fill-elided-attrs \
    -iree-vulkan-target-triple=rdna3-7900-linux \
    '--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))'

Compilation of unet vmfb will succeed, but Python will silently exit with error code 1 at this step without reporting any error:

https://github.com/nod-ai/SHARK/blob/97f7e79391c27474b3c2096d4c59ab6e664b27a5/shark/iree_utils/compile_utils.py#L324

Removing the mysterious cooperativeMatrixPropertiesNV from iree-vulkan-target-env makes SD pipeline usable:

https://github.com/nod-ai/SHARK/blob/97f7e79391c27474b3c2096d4c59ab6e664b27a5/shark/iree_utils/vulkan_target_env_utils.py#L118-L119

nod-ai / SHARK

unet vmfb compilation issues with FP16 vulkan in SHARK SD pipeline #1564