Closed SourStrips closed 1 year ago
I am facing this on RTX4000 which is an 8GB GPU just like the 6600XT, and interestingly at the time of failure I see it using almost all the memory 8015MiB / 8192MiB
so this might be an OOM , any ideas why its happening now, it used to work in older builds
@powderluv do you or anyone else know?
I also tried the low VRAM option but didnt do anything
Yes I can confirm this issue goes away with smaller sizes like 384x384, the default being 512x512 I was using that, so its an Out of memory (OOM), we could work on handling this error better in the future. @SourStrips Closing the issue but feel free to reopen if needed.
@nirvedhmeshram is there a fix in the works? Before I was rendering hundreds of images a day at 768 x 512 or 512 x 768. It suddenly stopped working. Are you saying that the error is on my end from my card?
Lets re-open and track the IR changes. If it worked in the past we should be able to get to that state at least.
@SourStrips can you check now, after https://github.com/nod-ai/SHARK/pull/1339 landing it is working on my GPU
Yes it works now thank you! I did have to remove some arguments to make it work reliably though. I’m going to slowly add them back in yo see which one is causing the error again.
FYI it seems that —device_allocator=caching is the main culprit causing this error now that it’s fixed. Without this argument it works perfectly
Thanks for pointing this out. Curious - based on looking at the code the default behavior is to not specify any device allocator, so only advance users trying command line arguments are likely to use this, is that correct?
Probably regular users as well, I have seen it mentioned in the discord as it gives a generous boost to IT/s. It was working before without doing anything extra except adding the argument. I wonder what happened.
I see, would you mind filing a separate issue with the error message for that so we can track it, unless its giving the same exact error as this one with the flag, then we can reopen this issue.
I will create a new ticket and description so it’s more clear what is going on
Well shark just wants to make me look like a dummy. I tried to get the error to copy and paste for you but now it decided to start working ¯_(ツ)_/¯
I can believe the caching allocator having intermittent issues, we will have to think how to make it reliable, thanks for pointing it out anyway.
Recently started getting this error, regular txt2img works fine. I did update drivers recently as well due to a windows update mishap.
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. 0%| | 0/1 [00:05<?, ?it/s] Traceback (most recent call last): File "gradio\routes.py", line 401, in run_predict File "gradio\blocks.py", line 1302, in process_api File "gradio\blocks.py", line 1025, in call_function File "anyio\to_thread.py", line 31, in run_sync File "anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread File "anyio_backends_asyncio.py", line 867, in run File "ui\img2img_ui.py", line 231, in img2img_inf File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_stencil.py", line 265, in generate_images File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 172, in decode_latents File "shark\shark_inference.py", line 138, in call File "shark\shark_runner.py", line 93, in run File "shark\iree_utils\compile_utils.py", line 385, in get_results File "iree\runtime\function.py", line 130, in call File "iree\runtime\function.py", line 154, in _invoke RuntimeError: Error invoking function: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_semaphore.cc:155: RESOURCE_EXHAUSTED; overflowed timeline semaphore max value; while invoking native function hal.fence.await; while calling import; [ 1] native hal.fence.await:0 - [ 0] bytecode module@0:32986 -