Open njsharpe opened 1 year ago
Thank you. Will get this looked at today.
(I also note that turbine isn't preserving source location properly and that can be fixed)
Thanks for the heads up. I'm looking into this right now, trying to reproduce. If there are any patterns in which unet/vae/lora combinations I should look at, let me know.
@monorimet The specific combination used in this example was as follows:
Model: Lily Mix v2 VAE: kl-f8-animev2 LoRA: Secret Sauce - 610
Each of these links is a civitai download link directly to the versions used. That being said, many other combinations of Model, VAE and LoRA have caused this issue.
@monorimet The specific combination used in this example was as follows:
Model: Lily Mix v2
VAE: kl-f8-animev2
LoRA: Secret Sauce - 610
Each of these links is a civitai download link directly to the versions used. That being said, many other combinations of Model, VAE and LoRA have caused this issue.
Thanks, I'll find out what's confusing the pipeline.
Something changes in the unet model when we use LoRAs. It seems to work fine without the tunings, but with tunings I reproduce your error.
I'll hypothesize that the bad combo is RDNA3 tuned + SD1.X UNet + LoRA weights since the custom VAE shouldn't impact UNet compilation. I'll take your word that only certain models/loras do this and try to figure out what's happening exactly.
We can operate under the assumption that the VAE does not affect anything, since that much has been more or less unproven since I use a VAE in every generation. The theory about tuned vs. untuned is supported because, using the same setup after a --clear_all, remove the LoRA and re-generate with your prompt. The error does not occur. I have also verified that, on first generation (untuned), SHARK functions as expected.
EDIT: Remove dumb question. As a temporary workaround, I will used the --no_use_tuned flag to avoid the issue all together.
Probably the tuned LoRA configuration, I don't see this without the LoRA or without tuning. We probably have to make some tuning config changes, I'll narrow this down and disable tunings for the failure modes and file a separate issue for a gentle rework of the tunings to better handle some of these custom model configs.
It looks like this happens without LoRAs now, too... I've got a minimal reproducer with the following IR:
https://storage.googleapis.com/shark-public/ean/unet_lc/module_forward_dispatch_5.mlir
%7 = linalg.matmul_transpose_b {compilation_info = #iree_codegen.compilation_info<lowering_config = <tile_sizes = [[32, 16], [16, 16], [0, 0, 64], [16, 16, 16]]>, translation_info = <SPIRVCooperativeMatrixVectorize pipeline_depth = 1 store_stage = 0>, workgroup_size = [32, 2, 1], subgroup_size = 32>} ins(%3, %4 : tensor<32x320xf16>, tensor<1280x320xf16>) outs(%6 : tensor<32x1280xf16>) -> tensor<32x1280xf16> loc("aten::addmm"("<eval_with_key>.12 from C:\\V\\SHARK\\shark.venv\\Lib\\site-packages\\torch\\fx\\experimental\\proxy_tensor.py:506 in wrapped":28:12))
The error (Also on the CI now):
<eval_with_key>.12 from C:\actions-runner\w\SHARK\SHARK\shark.venv\Lib\site-packages\torch\fx\experimental\proxy_tensor.py:507 in wrapped:28:12: error: 'func.func' op failed to get lowering configuration
<eval_with_key>.12 from C:\actions-runner\w\SHARK\SHARK\shark.venv\Lib\site-packages\torch\fx\experimental\proxy_tensor.py:507 in wrapped:28:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit, CooperativeMatrixKHR], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_integer_dot_product, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers, SPV_KHR_cooperative_matrix]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_khr = [#spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = i8, b_type = i8, c_type = i32, result_type = i32, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>>, #spirv.coop_matrix_props_khr<m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>>]>>}>
Running the reproducer CLI input gives the dispatch numbers 5, 7, 22, [...] which are matmul ops:
https://storage.googleapis.com/shark-public/ean/unet_lc/module_forward_dispatch_7.mlir
https://storage.googleapis.com/shark-public/ean/unet_lc/module_forward_dispatch_22.mlir
Currently getting something that looks the same as this on anything tuned. I think with differing eval key .<number>
though. (Also getting the arith.maxf complaint on everything).
Probably irrelevant thing I noticed, is that I also get this on untuned SD 2.1base, if I set the wrong --max_length
. So setting it to 77 rather than 64.
@monorimet just checking whether this is making progress?
With the most recent git pull (7963abb) and a clean environment, only some Model/VAE/LoRA combinations produce the compile issue below. In previous versions, this issue was not present (specifically 0361db4 was the last commit I had installed). I have tried various combinations and have failed to find a potential root cause. Any and all help is appreciated.
System Information