nod-ai / SHARK-Studio

SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution
Apache License 2.0
1.42k stars 171 forks source link

605 img to img works 1st run then dies out of memory vma_allocator.cc #1198

Open GoZippy opened 1 year ago

GoZippy commented 1 year ago

Runtime\c\runtime\src\iree\hal\drivers\vulkan\vma_allocator.cc:693: RESOURCE_EXHAUSTED; VK_ERROR_OUT_OF_DEVICE_MEMORY; vmaCreateBuffer; while invoking native function hal.device.queue.alloca

Found device AMD Radeon RX 6700 XT. Using target triple rdna2-unknown-windows. Using tuned models for CompVis/stable-diffusion-v1-4/fp16/vulkan://00000000-0300-0000-0000-000000000000. torch\jit_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in __init__. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in torch.jit.Attribute. warnings.warn("The TorchScript type system doesn't support " loading existing vmfb from: E:\ImageAI\shark\euler_scale_model_input_1_512_512fp16.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. loading existing vmfb from: E:\ImageAI\shark\euler_step_1_512_512fp16.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. use_tuned? sharkify: True _1_64_512_512_fp16_tuned_analogV2 WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Loaded vmfbs from cache and successfully fetched base model configuration. 43it [00:09, 4.31it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.42it/s] Traceback (most recent call last): File "gradio\routes.py", line 393, in run_predict File "gradio\blocks.py", line 1059, in process_api File "gradio\blocks.py", line 882, in call_function File "anyio\to_thread.py", line 31, in run_sync File "anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread File "anyio_backends_asyncio.py", line 867, in run File "gradio\utils.py", line 549, in async_iteration File "apps\stable_diffusion\scripts\img2img.py", line 225, in img2img_inf File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_img2img.py", line 141, in generate_images File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_img2img.py", line 81, in prepare_image_latents File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_img2img.py", line 93, in encode_image File "shark\shark_inference.py", line 138, in call File "shark\shark_runner.py", line 93, in run File "shark\iree_utils\compile_utils.py", line 382, in get_results File "iree\runtime\function.py", line 130, in call File "iree\runtime\function.py", line 154, in _invoke RuntimeError: Error invoking function: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\vma_allocator.cc:693: RESOURCE_EXHAUSTED; VK_ERROR_OUT_OF_DEVICE_MEMORY; vmaCreateBuffer; while invoking native function hal.device.queue.alloca; while calling import; [ 1] native hal.device.queue.alloca:0 - [ 0] bytecode module.forward:3260 [

.20:7:18, .20:9:27, .20:10:15, .20:18:10,
powderluv commented 1 year ago

I think the VAE model is now using more VRAM.

Three options here (we can do all three): 1: @jinchen62 implement ondemand load/unload of unet / vae 2: @monorimet try min-peak-memory scheduling 3: find out what changed upstream to increase memory usage.

GoZippy commented 1 year ago

Where is our version of /modules/processing.py ? wonding how it is currently setup here for scheduling VAE for sequential to avoid spikes... not sure what else to be looking at - very new here.