iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.48k stars 553 forks source link

'func.func' op exceeded stack allocation limit #14307

Open rprasad2 opened 1 year ago

rprasad2 commented 1 year ago

What happened?

Got the error while attempting to compile a tuned upscaler model.

Steps to reproduce your issue

This is the script I ran and attached is the broken dispatch as well as the error log.

iree-compile.exe "C:\Users\rahul\Documents\SHARK\shark_tmp\vae_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan\vae_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan_torch_linalg.mlir" --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=C:\Users\rahul\Documents\SHARK\stabilityai_stable-diffusion-2-1-base\core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna3-7900-windows --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32},iree-linalg-ext-convert-conv2d-to-winograd))' --mlir-disable-threading

error.log module_forward_dispatch_34.txt

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

ScottTodd commented 1 year ago

The attached error.log file appears to be empty? You can try recompiling with --iree-llvmcpu-stack-allocation-limit=<int>, but that error usually indicates some part of codegen is going off of the optimized / happy-path.

ScottTodd commented 1 year ago

The llvmcpu flag I mentioned might not apply here though, since you're using --iree-hal-target-backends=vulkan (why did you also specify --iree-llvmcpu-target-cpu-features=host?)

rprasad2 commented 1 year ago

That was the compile script that was running from the original mlir, I'll remove it to see what happens. Here is the information that should have been contained in the log file.

`.8:124:14: error: 'func.func' op uses 74496 bytes of shared memory; exceeded the limit of 65536 bytes

.8:124:14: error: Failures have been detected while processing an MLIR pass pipeline .8:124:14: note: Pipeline failed while executing [`mlir::iree_compiler::IREE::HAL::TranslateExecutablesPass` on 'hal.executable' operation: @forward_dispatch_34, `mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `GPUCheckResourceUsage` on 'builtin.module' operation]: reproducer generated at `C:\Users\rahul\Documents\SHARK\stabilityai_stable-diffusion-2-1-base\core-reproducer.mlir` .8:124:14: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits>, #spirv.coop_matrix_props>, #spirv.coop_matrix_props>]>>}> .8:124:14: error: failed to serialize executables` Also adding the flag `--iree-llvmcpu-stack-allocation-limit=` and removing `--iree-llvmcpu-target-cpu-features=host` didn't cause any changes.
ScottTodd commented 1 year ago

Ah, okay. Looks like the shared memory limit is included in the module being compiled: #spirv.resource_limits<max_compute_shared_memory_size = 65536. That should be getting populated by --iree-vulkan-target-triple=rdna3-7900-windows. If that target device (AMD GPU) has more shared memory available, the value set during the Vulkan/SPIR-V compilation path could be updated somehow (64KB is the default)

allieculp commented 11 months ago

@rprasad2 Any further updates here?