iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.54k stars 562 forks source link

16bit VAE model fails with Bitcast width mismatch on generic vulkan #11592

Closed stellaraccident closed 2 weeks ago

stellaraccident commented 1 year ago

If not compiling with any Vulkan target-triple, the VAE model used by shark.sd fails to compile with:

<unknown>:0: error: 'spirv.Bitcast' op mismatch in result type bitwidth 128 and operand type bitwidth 64
core-input.mlir:4461:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniform], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, #spirv.resource_limits<max_compute_workgroup_size = [128, 128, 64], subgroup_size = 64, cooperative_matrix_properties_nv = []>>}>
    %820 = linalg.conv_2d_nchw_fchw {dilations = dense<1> : vector<2xi64>, strides = dense<1> : vector<2xi64>} ins(%padded_368, %cst_142 : tensor<1x128x514x514xf16>, tensor<3x128x3x3xf16>) outs(%819 : tensor<1x3x512x512xf16>) -> tensor<1x3x512x512xf16>

Repro:

D:\sd\SHARK\shark.venv\lib\site-packages\iree\compiler\tools\..\_mlir_libs\iree-compile.exe --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --iree-llvm-embedded-linker-path=D:\sd\SHARK\shark.venv\lib\site-packages\iree\compiler\tools\..\_mlir_libs\iree-lld.exe --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvm-target-cpu-features=host --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs --iree-flow-enable-padding-linalg-ops --iree-flow-linalg-ops-padding-size=32 --iree-flow-enable-conv-img2col-transform 20221218_vae_spirv_bug.mlir
stellaraccident commented 1 year ago

Note that if using a triple of turing-unknown-windows (or similar), it compiles successfully. I expect some kind of f16 assumption that is wrong.

antiagainst commented 1 year ago

If given on explicit target environment or triple, we are using the unknown-unknown-unkown one, which has an absolute minimal set of widely applicable features to make sure the code generated is runnable everywhere. That means being very conservative. Float16 unfortunately is not something we can comfortably relying on under such circumstances:

Screenshot 2022-12-20 at 2 49 44 PM

Without it the CodeGen would try to emulate float16 with float32, and it causes issues when vector.cast from vector<2xf32> to vector<4xf16>. I created https://reviews.llvm.org/D140437 to fail such cases to make the error better.

antiagainst commented 1 year ago

Though we can use 16bit storage capabilities, which allows storing float16 (not computing):

Screenshot 2022-12-20 at 3 23 40 PM

Created #11625 to enable the above.

antiagainst commented 2 weeks ago

closing as obsolete -- no actions planned now