iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.86k stars 625 forks source link

[data-tiling] UNet compilation fails with data tiling enabled #15751

Open monorimet opened 12 months ago

monorimet commented 12 months ago

What happened?

Stable Diffusion UNet fails to compile as of https://github.com/nod-ai/SRT/releases/tag/candidate-20231130.600

Diagnostics:

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'iree_linalg_ext.set_encoding' op unhandled tensor operation

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'func.func' op failed to create tensor equivalance classes

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, NVIDIA:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 49152, max_compute_workgroup_invocations = [1024](https://github.com/nod-ai/SHARK/actions/runs/7051976572/job/19196033642#step:11:1025), max_compute_workgroup_size = [1024, 1024, 1024], min_subgroup_size = 32, max_subgroup_size = 32, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 8, n_size = 8, k_size = 32, a_type = i8, b_type = i8, c_type = i32, result_type = i32, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>>]>>}>

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'iree_linalg_ext.set_encoding' op unhandled tensor operation

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'func.func' op failed to create tensor equivalance classes

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_integer_dot_product, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, NVIDIA:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 49152, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 64], min_subgroup_size = 32, max_subgroup_size = 32, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 8, n_size = 8, k_size = 32, a_type = i8, b_type = i8, c_type = i32, result_type = i32, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>>]>>}>

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: failed to serialize executables

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'iree_linalg_ext.set_encoding' op unhandled tensor operation

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'func.func' op failed to create tensor equivalance classes

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, NVIDIA:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 49152, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], min_subgroup_size = 32, max_subgroup_size = 32, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 8, n_size = 8, k_size = 32, a_type = i8, b_type = i8, c_type = i32, result_type = i32, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>>]>>}>

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'iree_linalg_ext.set_encoding' op unhandled tensor operation

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'func.func' op failed to create tensor equivalance classes

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_integer_dot_product, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, NVIDIA:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 49152, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 64], min_subgroup_size = 32, max_subgroup_size = 32, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 8, n_size = 8, k_size = 32, a_type = i8, b_type = i8, c_type = i32, result_type = i32, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>>]>>}>

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: failed to serialize executables

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: 'builtin.module' op unhandled compilation of entry point functions with different translation info

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: failed to run configuration of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, NVIDIA:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 49152, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], min_subgroup_size = 32, max_subgroup_size = 32, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 8, n_size = 8, k_size = 32, a_type = i8, b_type = i8, c_type = i32, result_type = i32, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>>]>>}>

<eval_with_key>.12 from /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped:197:10: error: failed to configure executables

Invoked with:

 iree-compile /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile ./shark_tmp/unet_1_64_512_512_fp16_stable-diffusion-v1-4_vulkan_torch_linalg.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-vulkan-target-env=#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control, VK_KHR_cooperative_matrix], NVIDIA:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 49152, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 32, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 32, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, cooperativeMatrixPropertiesKHR = [#vk.coop_matrix_props<mSize = 8, nSize = 8, kSize = 32, aType = i8, bType = i8, cType = i32, resultType = i32, accSat = false, scope = #vk.scope<Subgroup>>, #vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, accSat = false, scope = #vk.scope<Subgroup>>, #vk.coop_matrix_props<mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f32, resultType = f32, accSat = false, scope = #vk.scope<Subgroup>>], shaderIntegerDotProduct = unit >> --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true --verify=false -iree-vulkan-target-triple=ampere-a100-linux --iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-global-opt-detach-elementwise-from-named-ops,iree-global-opt-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32})) --iree-flow-collapse-reduction-dims

Full log available at: https://github.com/nod-ai/SHARK/actions/runs/7051976572/job/19196033642#step:11:1010

Compiles successfully with --iree-opt-data-tiling=False

Reproducers are located at: https://console.cloud.google.com/storage/browser/shark-public/ean/unet-data-tiling

I can narrow this down further when I have some more cycles to spare, but for now I have data tiling disabled by default on SD sub-model compilation in SHARK. Let me know if I can provide specific follow-up otherwise I'll minimize the reproducers when I get a chance.

Steps to reproduce your issue

  1. Download reproducers
  2. Run contents of core-command-line.txt on latest IREE version or https://github.com/nod-ai/SRT/releases/tag/candidate-20231130.600

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

hanhanW commented 12 months ago

Thanks for filing the issue, I will take a look at it!

hanhanW commented 12 months ago

Hey @monorimet I checked the core-command-line.txt and found that it is using host features. Can you provide a way to reproduce it without using --iree-llvmcpu-target-cpu-features=host?

I can compile the core-input.mlir with the following command:

build/tools/iree-compile \
  --output-format=vm-bytecode \
  --iree-hal-target-backends=llvm-cpu \
  --iree-llvmcpu-target-cpu=cascadelake \
  --iree-llvmcpu-target-triple=x86_64-unknown-linux-gnu \
  ~/core-input.mlir -o /tmp/a.vmfb
hanhanW commented 12 months ago

Copying the error message out from the log, so people don't have to click into it. The error is'iree_linalg_ext.set_encoding' op unhandled tensor operation.