nod-ai / SHARK-Studio

SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution
Apache License 2.0
1.42k stars 171 forks source link

Vae compile fails on Upscaler, shared memory exceeded #1514

Open one-lithe-rune opened 1 year ago

one-lithe-rune commented 1 year ago

Getting the following on my local branch off main from a187e05ae6ee535f6f69e4e50029de589148d254:

shark_tank local cache is located at C:\Users\skapusniak\.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
vulkan devices are available.
cuda devices are not available.
C:\develop\SHARK\shark.venv\Lib\site-packages\diffusers\models\cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
  deprecate(
Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.
Found device AMD Radeon VII. Using target triple rdna2-unknown-windows.
Tuned models are currently not supported for this setting.
No vmfb found. Compiling and saving to C:\develop\SHARK\euler_scale_model_input_1_128_128_vulkan_fp16.vmfb
Configuring for device:vulkan://00000000-0c00-0000-0000-000000000000
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\develop\SHARK\euler_scale_model_input_1_128_128_vulkan_fp16.vmfb.
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
No vmfb found. Compiling and saving to C:\develop\SHARK\euler_step_1_128_128_vulkan_fp16.vmfb
Configuring for device:vulkan://00000000-0c00-0000-0000-000000000000
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\develop\SHARK\euler_step_1_128_128_vulkan_fp16.vmfb.
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
use_tuned? sharkify: False
_1_77_128_128_fp16_stable-diffusion-x4-upscaler
No vmfb found. Compiling and saving to C:\develop\SHARK\clip_1_77_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Configuring for device:vulkan://00000000-0c00-0000-0000-000000000000
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\develop\SHARK\clip_1_77_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb.
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
No vmfb found. Compiling and saving to C:\develop\SHARK\unet_1_77_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Configuring for device:vulkan://00000000-0c00-0000-0000-000000000000
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\develop\SHARK\unet_1_77_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb.
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3)
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
32it [00:24,  1.31it/s]
No vmfb found. Compiling and saving to C:\develop\SHARK\vae_1_77_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Configuring for device:vulkan://00000000-0c00-0000-0000-000000000000
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Traceback (most recent call last):
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\gradio\routes.py", line 414, in run_predict
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\gradio\blocks.py", line 1323, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\gradio\blocks.py", line 1067, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\gradio\utils.py", line 339, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\gradio\utils.py", line 332, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\gradio\utils.py", line 315, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\develop\SHARK\apps\stable_diffusion\web\ui\upscaler_ui.py", line 190, in upscaler_inf
    upscaled_image = global_obj.get_sd_obj().generate_images(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_upscaler.py", line 315, in generate_images
    self.load_vae()
  File "C:\develop\SHARK\apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 122, in load_vae
    self.vae = self.sd_model.vae()
               ^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\apps\stable_diffusion\src\models\model_wrappers.py", line 674, in vae
    sys.exit(e)
  File "C:\develop\SHARK\apps\stable_diffusion\src\models\model_wrappers.py", line 666, in vae
    compiled_vae, vae_mlir = self.get_vae()
                             ^^^^^^^^^^^^^^
  File "C:\develop\SHARK\apps\stable_diffusion\src\models\model_wrappers.py", line 286, in get_vae
    shark_vae, vae_mlir = compile_through_fx(
                          ^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\apps\stable_diffusion\src\utils\utils.py", line 161, in compile_through_fx
    _compile_module(shark_module, extended_model_name, extra_args),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\apps\stable_diffusion\src\utils\utils.py", line 71, in _compile_module
    path = shark_module.save_module(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark\shark_inference.py", line 188, in save_module
    return export_iree_module_to_vmfb(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark\iree_utils\compile_utils.py", line 352, in export_iree_module_to_vmfb
    flatbuffer_blob = compile_module_to_flatbuffer(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark\iree_utils\compile_utils.py", line 280, in compile_module_to_flatbuffer
    flatbuffer_blob = ireec.compile_str(
                      ^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\iree\compiler\tools\core.py", line 289, in compile_str
    result = invoke_immediate(cl, immediate_input=input_bytes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\develop\SHARK\shark.venv\Lib\site-packages\iree\compiler\tools\binaries.py", line 196, in invoke_immediate
    raise CompilerToolError(process)
SystemExit: Error invoking IREE compiler tool iree-compile.exe
Diagnostics:
<eval_with_key>.15:124:14: error: 'func.func' op uses 74496 bytes of shared memory; exceeded the limit of 65536 bytes
<eval_with_key>.15:124:14: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_nv = []>>}>
<eval_with_key>.15:124:14: error: failed to serialize executables

Invoked with:
 iree-compile.exe C:\develop\SHARK\shark.venv\Lib\site-packages\iree\compiler\tools\..\_mlir_libs\iree-compile.exe - --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --iree-llvmcpu-embedded-linker-path=C:\develop\SHARK\shark.venv\Lib\site-packages\iree\compiler\tools\..\_mlir_libs\iree-lld.exe --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-vulkan-target-env=#vk.target_env<v1.3, r(120), [VK_KHR_16bit_storage, VK_KHR_8bit_storage, VK_KHR_shader_float16_int8, VK_KHR_spirv_1_4, VK_KHR_storage_buffer_storage_class, VK_KHR_variable_pointers, VK_EXT_subgroup_size_control], AMD:DiscreteGPU, #vk.caps< maxComputeSharedMemorySize = 65536, maxComputeWorkGroupInvocations = 1024, maxComputeWorkGroupSize = dense<[1024, 1024, 1024]>: vector<3xi32>, subgroupSize = 64, subgroupFeatures = 255: i32, minSubgroupSize = 32, maxSubgroupSize = 64, shaderFloat16 = unit, shaderFloat64 = unit, shaderInt8 = unit, shaderInt16 = unit, shaderInt64 = unit, storageBuffer16BitAccess = unit, storagePushConstant16 = unit, uniformAndStorageBuffer16BitAccess = unit, storageBuffer8BitAccess = unit, storagePushConstant8 = unit, uniformAndStorageBuffer8BitAccess = unit, variablePointers = unit, variablePointersStorageBuffer = unit, shaderIntegerDotProduct = unit >> --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna2-unknown-windows --iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32},iree-linalg-ext-convert-conv2d-to-winograd))

Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.
powderluv commented 1 year ago

Seems like we are adding an invalid tuning