Upscaling does not work on RDNA2 (and maybe others)

cstueckrath commented 1 year ago

No vmfb found. Compiling and saving to J:\sd\SHARK\apps\stable_diffusion\web\vae_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in J:\sd\SHARK\apps\stable_diffusion\web\vae_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb.
Error registering modules: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_executable.cc:157: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module.__init:2142 <eval_with_key>.21:124:14
Retrying with a different base model configuration
loading existing vmfb from: J:\sd\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Retrying with a different base model configuration
loading existing vmfb from: J:\sd\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Retrying with a different base model configuration
loading existing vmfb from: J:\sd\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Retrying with a different base model configuration
loading existing vmfb from: J:\sd\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Retrying with a different base model configuration
Traceback (most recent call last):
  File "gradio\routes.py", line 393, in run_predict
  File "gradio\blocks.py", line 1069, in process_api
  File "gradio\blocks.py", line 892, in call_function
  File "anyio\to_thread.py", line 31, in run_sync
  File "anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
  File "anyio\_backends\_asyncio.py", line 867, in run
  File "gradio\utils.py", line 549, in async_iteration
  File "apps\stable_diffusion\scripts\upscaler.py", line 124, in upscaler_inf
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 368, in from_pretrained
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 663, in __call__
SystemExit: Cannot compile the model. Please create an issue with the detailed log at https://github.com/nod-ai/SHARK/issues

monorimet commented 1 year ago

reproduced on 6900xt:

(shark.venv) PS C:\V\SHARK\apps\stable_diffusion\web> python .\index.py
shark_tank local cache is located at C:\Users\ean\.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
C:\V\SHARK\shark.venv\Lib\site-packages\diffusers\models\cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
  deprecate(
vulkan devices are available.
cuda devices are not available.
Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.
Found device AMD Radeon RX 6900 XT. Using target triple rdna2-unknown-windows.
Tuned models are currently not supported for this setting.
No vmfb found. Compiling and saving to C:\V\SHARK\apps\stable_diffusion\web\euler_scale_model_input_1_128_128fp16.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\V\SHARK\apps\stable_diffusion\web\euler_scale_model_input_1_128_128fp16.vmfb.
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
No vmfb found. Compiling and saving to C:\V\SHARK\apps\stable_diffusion\web\euler_step_1_128_128fp16.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\V\SHARK\apps\stable_diffusion\web\euler_step_1_128_128fp16.vmfb.
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
use_tuned? sharkify: False
_1_64_128_128_fp16_stable-diffusion-x4-upscaler
Inferring base model configuration.
C:\V\SHARK\shark.venv\Lib\site-packages\transformers\modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()  with safe_open(checkpoint_file, framework="pt") as f:
C:\V\SHARK\shark.venv\Lib\site-packages\torch\_utils.py:777: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
C:\V\SHARK\shark.venv\Lib\site-packages\torch\storage.py:955: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
C:\V\SHARK\shark.venv\Lib\site-packages\safetensors\torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
No vmfb found. Compiling and saving to C:\V\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\V\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb.
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
No vmfb found. Compiling and saving to C:\V\SHARK\apps\stable_diffusion\web\unet_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\V\SHARK\apps\stable_diffusion\web\unet_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb.
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
No vmfb found. Compiling and saving to C:\V\SHARK\apps\stable_diffusion\web\vae_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args
Saved vmfb in C:\V\SHARK\apps\stable_diffusion\web\vae_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb.
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
Error registering modules: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_executable.cc:157: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module.__init:2142 <eval_with_key>.15:124:14
Retrying with a different base model configuration
loading existing vmfb from: C:\V\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Retrying with a different base model configuration
loading existing vmfb from: C:\V\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Retrying with a different base model configuration
loading existing vmfb from: C:\V\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Retrying with a different base model configuration
loading existing vmfb from: C:\V\SHARK\apps\stable_diffusion\web\clip_1_64_128_128_fp16_stable-diffusion-x4-upscaler_vulkan.vmfb
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Retrying with a different base model configuration
Traceback (most recent call last):
  File "C:\V\SHARK\shark.venv\Lib\site-packages\gradio\routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\V\SHARK\shark.venv\Lib\site-packages\gradio\blocks.py", line 1069, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\V\SHARK\shark.venv\Lib\site-packages\gradio\blocks.py", line 892, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\V\SHARK\shark.venv\Lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\V\SHARK\shark.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\V\SHARK\shark.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\V\SHARK\shark.venv\Lib\site-packages\gradio\utils.py", line 549, in async_iteration
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\V\SHARK\apps\stable_diffusion\scripts\upscaler.py", line 124, in upscaler_inf
    UpscalerPipeline.from_pretrained(
  File "C:\V\SHARK\apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 370, in from_pretrained
    clip, unet, vae = mlir_import()
                      ^^^^^^^^^^^^^
  File "C:\V\SHARK\apps\stable_diffusion\src\models\model_wrappers.py", line 684, in __call__
    sys.exit(
SystemExit: Cannot compile the model. Please create an issue with the detailed log at https://github.com/nod-ai/SHARK/issues

martin-haynes commented 1 year ago

i am also having trouble when attempting to invoke the hal native executable but in my case on Ubuntu linux using the repo command line test inference.

*.vmfb compilation completes successfully and seemingly sends processing to the 6900xt, which 'sounds out' the 50 step iterations at about 7.5it/sec.

But once complete the job fails just as above.

So far i've tried vulkan-sdk release 1.3.243 and 1.2.36 but the unknown error and failure point are the same.

i'm using the latest amd gpu driver set -> amdgpu-pro-core/jammy,jammy,now 22.40-1538781.22.04

No vmfb found. Compiling and saving to /home/<user>/SHARK/vae_1_64_512_512_fp16_tuned_stable-diffusion-2-1-base_vulkan.vmfb
Using target triple -iree-vulkan-target-triple=rdna2-unknown-linux from command line args
Saved vmfb in /home/<user>/SHARK/vae_1_64_512_512_fp16_tuned_stable-diffusion-2-1-base_vulkan.vmfb.
Error registering modules: main_checkout/runtime/src/iree/hal/drivers/vulkan/native_executable.cc:153: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import; 
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:2718 -

sarekv commented 1 year ago

Not sure if its related but having the same issue in effect. Using a 6800m (RDNA2). text-to-image and Image-to-Image are working fine but every time i run the upscaler I get the following error -

Found device AMD Radeon RX 6800M (RADV NAVI22). Using target triple rdna2-unknown-linux. Tuned models are currently not supported for this setting. loading existing vmfb from: /home//SHARK/apps/stable_diffusion/web/euler_scale_model_input_1_128_128fp16.vmfb loading existing vmfb from: /home//SHARK/apps/stable_diffusion/web/euler_step_1_128_128fp16.vmfb use_tuned? sharkify: False _1_64_128_128_fp16_stable-diffusion-x4-upscaler 0it [00:00, ?it/s] Traceback (most recent call last): File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/gradio/routes.py", line 393, in run_predict output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/gradio/blocks.py", line 1108, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/gradio/blocks.py", line 929, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/gradio/utils.py", line 490, in async_iteration return next(iterator) ^^^^^^^^^^^^^^ File "/home//SHARK/apps/stable_diffusion/scripts/upscaler.py", line 165, in upscaler_inf upscaled_image = global_obj.get_sd_obj().generate_images( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/apps/stable_diffusion/src/pipelines/pipeline_shark_stable_diffusion_upscaler.py", line 294, in generate_images latents = self.produce_img_latents( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/apps/stable_diffusion/src/pipelines/pipeline_shark_stable_diffusion_upscaler.py", line 181, in produce_img_latents noise_pred = self.unet( ^^^^^^^^^^ File "/home//SHARK/shark/shark_inference.py", line 138, in call return self.shark_runner.run(function_name, inputs, send_to_host) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/shark/shark_runner.py", line 93, in run return get_results( ^^^^^^^^^^^^ File "/home//SHARK/shark/iree_utils/compile_utils.py", line 385, in get_results result = compiled_vm[function_name](device_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/iree/runtime/function.py", line 130, in call self._invoke(arg_list, ret_list) File "/home//SHARK/shark.venv/lib64/python3.11/site-packages/iree/runtime/function.py", line 154, in _invoke self._vm_context.invoke(self._vm_function, arg_list, ret_list) ValueError: Error invoking function: c/runtime/src/iree/modules/hal/utils/buffer_diagnostics.c:185: INVALID_ARGUMENT; input 0 element type mismatch; expected f16 (21000010) but have f32 (21000020); while invoking native function hal.buffer_view.assert; while calling import; [ 1] native hal.buffer_view.assert:0 - [ 0] bytecode module@0:8628 -

bumbiyada commented 1 year ago

Looks like I have the same issue on rx6700xt running on Win10

GoZippy commented 1 year ago

same 6700xt win10

tomphan commented 11 months ago

Mine Win10 AMD Radeon R9 M375X same error.

WARNING: [Loader Message] Code 0 : loaderAddLayerProperties: E:\VulkanSDK\1.3.261.1\Bin\VkLayer_api_dump.json invalid layer manifest file version 1.2.0.  May cause errors.
WARNING: [Loader Message] Code 0 : loaderAddLayerProperties: E:\VulkanSDK\1.3.261.1\Bin\VkLayer_gfxreconstruct.json invalid layer manifest file version 1.2.0.  May cause errors.
WARNING: [Loader Message] Code 0 : loaderAddLayerProperties: E:\VulkanSDK\1.3.261.1\Bin\VkLayer_khronos_synchronization2.json invalid layer manifest file version 1.2.0.  May cause errors.
WARNING: [Loader Message] Code 0 : loaderAddLayerProperties: E:\VulkanSDK\1.3.261.1\Bin\VkLayer_khronos_validation.json invalid layer manifest file version 1.2.0.  May cause errors.
WARNING: [Loader Message] Code 0 : loaderAddLayerProperties: E:\VulkanSDK\1.3.261.1\Bin\VkLayer_screenshot.json invalid layer manifest file version 1.2.0.  May cause errors.
WARNING: [Loader Message] Code 0 : loaderAddLayerProperties: E:\VulkanSDK\1.3.261.1\Bin\VkLayer_khronos_profiles.json invalid layer manifest file version 1.2.1.  May cause errors.
WARNING: [Loader Message] Code 0 : loaderAddLayerProperties: E:\VulkanSDK\1.3.261.1\Bin\VkLayer_khronos_shader_object.json invalid layer manifest file version 1.2.0.  May cause errors.
Loading flatbuffer at e:\SharkStudio\clip_1_64_512_512_fp16_tuned_stable-diffusion-2-1-base_vulkan.vmfb as a mmapped file
Traceback (most recent call last):
  File "gradio\routes.py", line 439, in run_predict
  File "gradio\blocks.py", line 1384, in process_api
  File "gradio\blocks.py", line 1103, in call_function
  File "gradio\utils.py", line 343, in async_iteration
  File "gradio\utils.py", line 336, in __anext__
  File "anyio\to_thread.py", line 33, in run_sync
  File "anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
  File "anyio\_backends\_asyncio.py", line 807, in run
  File "gradio\utils.py", line 319, in run_sync_iterator_async
  File "gradio\utils.py", line 688, in gen_wrapper
  File "ui\txt2img_ui.py", line 186, in txt2img_inf
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_txt2img.py", line 123, in generate_images
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 432, in encode_prompts_weight
  File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 96, in load_clip
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 777, in clip
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 770, in clip
  File "apps\stable_diffusion\src\models\model_wrappers.py", line 690, in get_clip
  File "apps\stable_diffusion\src\utils\utils.py", line 132, in compile_through_fx
  File "apps\stable_diffusion\src\utils\utils.py", line 55, in _load_vmfb
  File "shark\shark_inference.py", line 208, in load_module
  File "shark\iree_utils\compile_utils.py", line 438, in load_flatbuffer
  File "shark\iree_utils\compile_utils.py", line 382, in load_vmfb_using_mmap
  File "iree\runtime\system_api.py", line 271, in add_vm_module
  File "iree\runtime\system_api.py", line 268, in add_vm_modules
SystemExit: Error registering modules: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\native_executable.cc:160: UNKNOWN; VkResult=4294967283; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:862 -

powderluv commented 11 months ago

That usually means the m375x driver is not supported. However we are close to getting rocm enabled in theory may work better but it is a very old architecture and 2G VRAM likely won't be enough.

nod-ai / SHARK

Upscaling does not work on RDNA2 (and maybe others) #1230