nod-ai / SHARK

SHARK - High Performance Machine Learning Distribution
Apache License 2.0
1.41k stars 168 forks source link

unet vmfb compilation issues with FP16 vulkan in SHARK SD pipeline #1564

Open suryajasper opened 1 year ago

suryajasper commented 1 year ago

When compiling unet mlir to vmfb in the stable diffusion pipeline configured with fp16 using vulkan, the SHARK-generated iree-compile command fails due to improper vulkan environment setup (use of --iree-vulkan-target-env.

Produced by SD pipeline:

iree-compile - \
      --iree-input-type=tm_tensor \
      --iree-vm-bytecode-module-output-format=flatbuffer-binary \
      --iree-hal-target-backends=vulkan \
      --iree-llvmcpu-embedded-linker-path=/home/nod/Documents/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld \
      --mlir-print-debuginfo \
      --mlir-print-op-on-diagnostic=false \
      --iree-llvmcpu-target-cpu-features=host \
      --iree-vulkan-target-env="#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_NV_cooperative_matrix], NVIDIA:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 49152, maxComputeWorkGroupInvocations = 1536, maxComputeWorkGroupSize = dense<[1536, 1024, 64]>: vector<3xi32>, subgroupSize = 32, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 32, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, shaderIntegerDotProduct = unit >>" \
      --iree-stream-resource-index-bits=64 \
      --iree-stream-resource-index-bits=64 \
      --iree-spirv-index-bits=64 \
      --iree-vm-target-index-bits=64 \
      --iree-vm-bytecode-module-strip-source-map=true \
      --iree-util-zero-fill-elided-attrs \

Compiling using vulkan with FP16 results in either arith.const errors or spirv.op / memref.load errors (as shown below) in all dispatches regardless of how the spirv index bits are configured.

<eval_with_key>.3:28:12: error: 'spirv.Store' op mismatch in result type and pointer type
<eval_with_key>.3:28:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_NV_cooperative_matrix]>, api=Vulkan, NVIDIA:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 49152, max_compute_workgroup_invocations = 1536, max_compute_workgroup_size = [1536, 1024, 64], min_subgroup_size = 32, max_subgroup_size = 32, cooperative_matrix_properties_nv = []>>}>
<eval_with_key>.3:28:12: error: failed to serialize executables

Forgoing the SHARK-configured iree-vulkan-target-env altogether and simply specifying the vulkan target triple solves these issues and allows the vmfbs to be compiled successfully. This is a functional workaround, but the SD pipeline is consistently failing to compile unet for FP16 vulkan on its own because of the vk environment setup.

gjz010 commented 1 year ago

Running into similar problem here. Running SHARK on Radeon 7900XTX (RDNA3) with AMDVLK driver 2023.Q2.2 (LLPC), on NixOS unstable (fhs through steam-run wrapper).

Dump of compilation command(Invoked IREE Tools):

iree-compile - \
    --iree-input-type=tm_tensor \
    --iree-vm-bytecode-module-output-format=flatbuffer-binary \
    --iree-hal-target-backends=vulkan \
    --mlir-print-debuginfo \
    --mlir-print-op-on-diagnostic=false \
    --iree-llvmcpu-target-cpu-features=host \
    '--iree-vulkan-target-env=#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_NV_cooperative_matrix], AMD:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 65536, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 64, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 64, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, cooperativeMatrixPropertiesNV = [#vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, scope = #vk.scope<Subgroup>>], shaderIntegerDotProduct = unit >>' \
    --iree-stream-resource-index-bits=64 \
    --iree-vm-target-index-bits=64 \
    --iree-vm-bytecode-module-strip-source-map=true \
    --iree-util-zero-fill-elided-attrs \
    -iree-vulkan-target-triple=rdna3-7900-linux \
    '--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))'

Compilation of unet vmfb will succeed, but Python will silently exit with error code 1 at this step without reporting any error:

https://github.com/nod-ai/SHARK/blob/97f7e79391c27474b3c2096d4c59ab6e664b27a5/shark/iree_utils/compile_utils.py#L324

Removing the mysterious cooperativeMatrixPropertiesNV from iree-vulkan-target-env makes SD pipeline usable:

https://github.com/nod-ai/SHARK/blob/97f7e79391c27474b3c2096d4c59ab6e664b27a5/shark/iree_utils/vulkan_target_env_utils.py#L118-L119

powderluv commented 1 year ago

You can use the rdna2 target triple to disable the wmma pipeline. The pro diver has it but the amdvlk driver doesn't.