Radeon R7 360 and R9 370X: INVALID_ARGUMENT; FlatBuffer verification failed: buffer header too small

iamhumanipromise commented 1 year ago

Followed all installation instructions on Windows. Both GPUs have 4GB of VRAM. Fresh Download.

vulkan devices are available.
cuda devices are available.
diffusers\models\cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.
Found device AMD Radeon R7 360 Series. Using target triple rdna2-unknown-windows.
Using tuned models for stabilityai/stable-diffusion-2-1/fp16/vulkan://00000000-0400-0000-0000-000000000000.
Downloading (…)cheduler_config.json: 100%|█████████████████████████████████████████████| 345/345 [00:00<00:00, 345kB/s]
huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\ericj\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
loading existing vmfb from: C:\Users\ericj\Downloads\euler_scale_model_input_1_512_512fp16.vmfb
loading existing vmfb from: C:\Users\ericj\Downloads\euler_step_1_512_512fp16.vmfb
use_tuned? sharkify: True
_1_64_512_512_fp16_tuned_stable-diffusion-2-1-base
Downloading (…)tokenizer/vocab.json: 100%|█████████████████████████████████████████| 1.06M/1.06M [00:01<00:00, 723kB/s]
Downloading (…)tokenizer/merges.txt: 100%|███████████████████████████████████████████| 525k/525k [00:00<00:00, 753kB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████| 460/460 [00:00<00:00, 460kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████| 824/824 [00:00<00:00, 826kB/s]
transformers\modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
Traceback (most recent call last):
  File "gradio\routes.py", line 401, in run_predict
  File "gradio\blocks.py", line 1302, in process_api
  File "gradio\blocks.py", line 1039, in call_function
  File "anyio\to_thread.py", line 31, in run_sync
  File "anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
  File "anyio\_backends\_asyncio.py", line 867, in run
  File "gradio\utils.py", line 491, in async_iteration
  File "ui\txt2img_ui.py", line 173, in txt2img_inf
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_txt2img.py", line 114, in generate_images
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 376, in encode_prompts_weight
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 86, in load_clip
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 601, in clip
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 594, in clip
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 531, in get_clip
  File "apps\stable_diffusion\src\utils\utils.py", line 120, in compile_through_fx
  File "apps\stable_diffusion\src\utils\utils.py", line 47, in _load_vmfb
  File "shark\shark_inference.py", line 207, in load_module
  File "shark\iree_utils\compile_utils.py", line 333, in load_flatbuffer
  File "shark\iree_utils\compile_utils.py", line 305, in get_iree_module
SystemExit: Error creating vm module from FlatBuffer: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\vm\bytecode\verifier.c:29: INVALID_ARGUMENT; FlatBuffer verification failed: buffer header too small

Trying again...

Found device AMD Radeon R7 360 Series. Using target triple rdna2-unknown-windows.
Tuned models are currently not supported for this setting.
loading existing vmfb from: C:\Users\ericj\Downloads\euler_scale_model_input_1_384_384fp16.vmfb
loading existing vmfb from: C:\Users\ericj\Downloads\euler_step_1_384_384fp16.vmfb
use_tuned? sharkify: False
_1_64_384_384_fp16_stable-diffusion-2-1-base
No vmfb found. Compiling and saving to C:\Users\ericj\Downloads\clip_1_64_384_384_fp16_stable-diffusion-2-1-base_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\Users\ericj\Downloads\clip_1_64_384_384_fp16_stable-diffusion-2-1-base_vulkan.vmfb.
No vmfb found. Compiling and saving to C:\Users\ericj\Downloads\unet_1_64_384_384_fp16_stable-diffusion-2-1-base_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\Users\ericj\Downloads\unet_1_64_384_384_fp16_stable-diffusion-2-1-base_vulkan.vmfb.
Traceback (most recent call last):
  File "gradio\routes.py", line 401, in run_predict
  File "gradio\blocks.py", line 1302, in process_api
  File "gradio\blocks.py", line 1039, in call_function
  File "anyio\to_thread.py", line 31, in run_sync
  File "anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
  File "anyio\_backends\_asyncio.py", line 867, in run
  File "gradio\utils.py", line 491, in async_iteration
  File "ui\txt2img_ui.py", line 173, in txt2img_inf
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_txt2img.py", line 122, in generate_images
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 203, in produce_img_latents
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 103, in load_unet
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 640, in unet
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 611, in unet
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 575, in compile_unet_variants
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 450, in get_unet
  File "apps\stable_diffusion\src\utils\utils.py", line 157, in compile_through_fx
  File "apps\stable_diffusion\src\utils\utils.py", line 69, in _compile_module
  File "shark\shark_inference.py", line 207, in load_module
  File "shark\iree_utils\compile_utils.py", line 333, in load_flatbuffer
  File "shark\iree_utils\compile_utils.py", line 309, in get_iree_module
  File "iree\runtime\system_api.py", line 255, in add_vm_module
  File "iree\runtime\system_api.py", line 252, in add_vm_modules
SystemExit: Error registering modules: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_executable.cc:157: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:1404 -

Then I decided to try an older model, maybe 2.1 just didn't work.

Found device AMD Radeon R7 360 Series. Using target triple rdna2-unknown-windows.
Tuned models are currently not supported for this setting.
Downloading (…)cheduler_config.json: 100%|█████████████████████████████████████████████| 313/313 [00:00<00:00, 313kB/s]
loading existing vmfb from: C:\Users\ericj\Downloads\euler_scale_model_input_1_384_384fp16.vmfb
loading existing vmfb from: C:\Users\ericj\Downloads\euler_step_1_384_384fp16.vmfb
use_tuned? sharkify: False
_1_64_384_384_fp16_stable-diffusion-v1-4
Downloading (…)tokenizer/vocab.json: 100%|████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 3.31MB/s]
Downloading (…)tokenizer/merges.txt: 100%|██████████████████████████████████████████| 525k/525k [00:00<00:00, 7.29MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████| 472/472 [00:00<00:00, 472kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████| 806/806 [00:00<00:00, 808kB/s]
Downloading (…)_encoder/config.json: 100%|█████████████████████████████████████████████| 592/592 [00:00<00:00, 590kB/s]
Downloading model.safetensors: 100%|████████████████████████████████████████████████| 492M/492M [00:57<00:00, 8.63MB/s]
No vmfb found. Compiling and saving to C:\Users\ericj\Downloads\clip_1_64_384_384_fp16_stable-diffusion-v1-4_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\Users\ericj\Downloads\clip_1_64_384_384_fp16_stable-diffusion-v1-4_vulkan.vmfb.
Downloading (…)ain/unet/config.json: 100%|█████████████████████████████████████████████| 743/743 [00:00<00:00, 743kB/s]
Downloading (…)ch_model.safetensors: 100%|████████████████████████████████████████| 3.44G/3.44G [02:21<00:00, 24.3MB/s]
mat1 and mat2 shapes cannot be multiplied (128x1024 and 768x320)
Retrying with a different base model configuration
No vmfb found. Compiling and saving to C:\Users\ericj\Downloads\unet_1_64_384_384_fp16_stable-diffusion-v1-4_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\Users\ericj\Downloads\unet_1_64_384_384_fp16_stable-diffusion-v1-4_vulkan.vmfb.
Error registering modules: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_executable.cc:157: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:1404 -
Retrying with a different base model configuration
Error registering modules: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_executable.cc:157: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:1404 -
Retrying with a different base model configuration
Error registering modules: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_executable.cc:157: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:1404 -
Retrying with a different base model configuration
Error registering modules: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_executable.cc:157: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:1404 -
Retrying with a different base model configuration
Traceback (most recent call last):
  File "gradio\routes.py", line 401, in run_predict
  File "gradio\blocks.py", line 1302, in process_api
  File "gradio\blocks.py", line 1039, in call_function
  File "anyio\to_thread.py", line 31, in run_sync
  File "anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
  File "anyio\_backends\_asyncio.py", line 867, in run
  File "gradio\utils.py", line 491, in async_iteration
  File "ui\txt2img_ui.py", line 173, in txt2img_inf
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_txt2img.py", line 122, in generate_images
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 203, in produce_img_latents
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 103, in load_unet
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 640, in unet
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 635, in unet
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 59, in check_compilation
SystemExit: Could not compile Unet. Please create an issue with the detailed log at https://github.com/nod-ai/SHARK/issues

So now I scoped this out with the Bing Chat GPT AI for it to help me figure out what to report to you. Here is its report:

First it advised:

The error code 4294967283 corresponds to VK_ERROR_DEVICE_LOST, which indicates that the logical or physical device has been lost. This could be caused by a hardware failure, driver bug, or system instability. The error occurred while invoking a native function called hal.executable.create, which is defined in the file native_executable.cc at line 157. The error also propagated to a bytecode module at line 1404. You may want to debug the program and check the state of the device and the modules before and after the error occurs.

Then I asked it to review line 157 and 1404, and provided it with the github link.

This means that the program encountered a fatal error while trying to register some modules that use the Vulkan API for graphics rendering. The error code 4294967283 corresponds to VK_ERROR_DEVICE_LOST, which indicates that the logical or physical device has been lost. This could be caused by a hardware failure, driver bug, or system instability. The error occurred while invoking a native function called hal.executable.create, which is defined in the file native_executable.cc at line 157. The error also propagated to a bytecode module at line 1404. You may want to debug the program and check the state of the device and the modules before and after the error occurs.

Thank you for sharing the github link. I have reviewed the code and found the following:

Line 157 of native_executable.cc is where the function iree_hal_vulkan_native_executable_create is defined. This function creates an executable that can run on one or more devices using Vulkan pipelines. It takes a list of pipeline caches and a list of shader modules as parameters.

Line 1404 of bytecode_module.cc is where the function iree_hal_vulkan_native_executable_create is called by the function iree_hal_vulkan_native_executable_cache_prepare_executable. This function prepares an executable for use by one or more devices. It takes a cache, a descriptor set layout, and a list of entry points as parameters.

It seems that the error occurs when the program tries to create an executable from a cache and a list of shader modules, but the device is lost or disconnected. You may want to check if the device is properly connected and configured, and if the cache and the shader modules are valid and compatible with the device.

RousseauRemi commented 1 year ago

Same problem here !

nirvedhmeshram commented 1 year ago

My hypothesis would be that this is related to the GPU running out of memory and the device crashing. We did make some changes in the latest release so that it doesn't use as much memory. So its worth giving it a fresh try again. If that doesn't work please also check if you lower the resolution to the minimum 384x384 if that works.

RousseauRemi commented 1 year ago

So an Amd RX 7900 XTX with 24go is not enough ?

nirvedhmeshram commented 1 year ago

So an Amd RX 7900 XTX with 24go is not enough ?

That should be more than enough, I think @iamhumanipromise has 4GB

@workpioupiou are you still seeing this issue on an Amd RX 7900 XTX with the latest build?

RousseauRemi commented 1 year ago

It seams to work on the last update, thanks !

thgcnet commented 1 year ago

"I may have some progress with the case using alternative AMD drivers. Please refer to https://www.amd.com/en/support/kb/faq/pdh-install."

RousseauRemi commented 1 year ago

Sorry for the late response, I can't really tell, there is a week, it was working well. Yesterday, I has the error of this ticket https://github.com/nod-ai/SHARK/issues/1397 but on branch 700 and 712 but not with 714 (literraly the contrary of what is mention on the duplicate ticket). I have a lot of malfunctionning, like when I execute the software, it's working for 2 or 3 prompt then it display an error all the times and need to be restart (on some models, the error stay after the restart).

Others errors can be my fault too, some model come with integrated vae (of what I understand) but I don't know if Shark know it and select them by default or if it's done by the model directly or if it's skip). Lora have the same symptoms. I tried one or two times last week ImgtoImg it works only one time. But the other times I was using image not generated by Shark (maybe it's that the problem ? ) Upscaling works great with X4 upscaler (before 714 there was a lot of model that do nothing with it in the dropdown but it seem to have been corrected) Impating has work, but I didn't test it a lot Outpainting has never work.

I don't tell that to critic the software, just to tell I have some bugs :) And I don't think I will have a memory problem now. I upgrade my ram from 32 to 96go,

nod-ai / SHARK-Studio

Radeon R7 360 and R9 370X: INVALID_ARGUMENT; FlatBuffer verification failed: buffer header too small #1333