Open suryajasper opened 1 year ago
Running into similar problem here.
Running SHARK on Radeon 7900XTX (RDNA3) with AMDVLK driver 2023.Q2.2 (LLPC), on NixOS unstable (fhs through steam-run
wrapper).
Dump of compilation command(Invoked IREE Tools
):
iree-compile - \
--iree-input-type=tm_tensor \
--iree-vm-bytecode-module-output-format=flatbuffer-binary \
--iree-hal-target-backends=vulkan \
--mlir-print-debuginfo \
--mlir-print-op-on-diagnostic=false \
--iree-llvmcpu-target-cpu-features=host \
'--iree-vulkan-target-env=#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_NV_cooperative_matrix], AMD:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 65536, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 64, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 64, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, cooperativeMatrixPropertiesNV = [#vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, scope = #vk.scope<Subgroup>>], shaderIntegerDotProduct = unit >>' \
--iree-stream-resource-index-bits=64 \
--iree-vm-target-index-bits=64 \
--iree-vm-bytecode-module-strip-source-map=true \
--iree-util-zero-fill-elided-attrs \
-iree-vulkan-target-triple=rdna3-7900-linux \
'--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))'
Compilation of unet vmfb will succeed, but Python will silently exit with error code 1 at this step without reporting any error:
Removing the mysterious cooperativeMatrixPropertiesNV
from iree-vulkan-target-env makes SD pipeline usable:
You can use the rdna2 target triple to disable the wmma pipeline. The pro diver has it but the amdvlk driver doesn't.
When compiling unet mlir to vmfb in the stable diffusion pipeline configured with fp16 using vulkan, the SHARK-generated iree-compile command fails due to improper vulkan environment setup (use of --iree-vulkan-target-env.
Produced by SD pipeline:
Compiling using vulkan with FP16 results in either arith.const errors or spirv.op / memref.load errors (as shown below) in all dispatches regardless of how the spirv index bits are configured.
Forgoing the SHARK-configured iree-vulkan-target-env altogether and simply specifying the vulkan target triple solves these issues and allows the vmfbs to be compiled successfully. This is a functional workaround, but the SD pipeline is consistently failing to compile unet for FP16 vulkan on its own because of the vk environment setup.