nod-ai / SHARK

SHARK - High Performance Machine Learning Distribution
Apache License 2.0
1.4k stars 169 forks source link

Stuck at "Compiling Vulkan shaders" (or takes forever?) #2128

Closed Legendrik closed 2 months ago

Legendrik commented 2 months ago

Hi, so I recently installed Shark on my system (I7 4770K, 32GB DDR3, WX 9100 (flashed MI25)) and it seems to me that the "compiling Vulkan shaders" takes forever or gets stuck.

I tried with the Dreamshaper model and the base stable diffusion model (1.4).

I'm using the latest stable release ([nod.ai SHARK 20240126.1139)

While compiling, neither the cpu, gpu or ram get used at all.

I will let my PC running through the night and see tomorrow if the shaders are compiled or not.

I will update you tomorrow since I'm going to sleep, but tbh I don't think this will go anywhere.

Also, there seems to be errors with specific files (watch Tracebacks below and fat printed)).

Here is the cmd log (I don't know if that helps):

shark_tank local cache is located at C:\Users\Instinct\.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
**transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
Clearing .mlir temporary files from a prior run. This may take some time...
Clearing .mlir temporary files took 0.1918 seconds.
gradio temporary image cache located at E:\Shark\shark_tmp/gradio. You may change this by setting the GRADIO_TEMP_DIR environment variable.
Clearing gradio UI temporary image files from a prior run. This may take some time...
Clearing gradio UI temporary image files took 0.0000 seconds.
vulkan devices are available.
metal devices are not available.
cuda devices are not available.
rocm devices are available.
shark_tank local cache is located at C:\Users\Instinct\.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
local-sync devices are available.
shark_tank local cache is located at C:\Users\Instinct\.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
local-task devices are available.
shark_tank local cache is located at C:\Users\Instinct\.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
ui\txt2img_ui.py:373: UserWarning: Settings.json file not found or 'txt2img' key is missing. Using default values for fields.
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
{'cpu': ['Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz => cpu-task'], 'cuda': [], 'vulkan': ['Radeon (TM) Pro WX 9100 => vulkan://0'], 'rocm': ['Radeon (TM) Pro WX 9100 => rocm']}
Running on local URL:  http://0.0.0.0:8080
shark_tank local cache is located at C:\Users\Instinct\.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
transformers\utils\generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.**

To create a public link, set `share=True` in `launch()`.
Tuned models are currently not supported for this setting.
saving euler_scale_model_input_1_512_512_rocm_fp16_torch_linalg.mlir to .\shark_tmp
No vmfb found. Compiling and saving to E:\Shark\euler_scale_model_input_1_512_512_rocm_fp16.vmfb
Configuring for device:rocm
could not execute `iree-run-module --dump_devices=rocm`
Did not find ROCm architecture from `--iree-rocm-target-chip` flag
 or from `iree-run-module --dump_devices=rocm` command.
Using gfx1100 as ROCm arch for compilation.
Saved vmfb in E:\Shark\euler_scale_model_input_1_512_512_rocm_fp16.vmfb.
Loading module E:\Shark\euler_scale_model_input_1_512_512_rocm_fp16.vmfb...
**Traceback (most recent call last):
  File "C:\Users\Instinct\AppData\Local\Temp\_MEI38282\gradio\queueing.py", line 489, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Instinct\AppData\Local\Temp\_MEI38282\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Instinct\AppData\Local\Temp\_MEI38282\gradio\blocks.py", line 1561, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Instinct\AppData\Local\Temp\_MEI38282\gradio\blocks.py", line 1191, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Instinct\AppData\Local\Temp\_MEI38282\gradio\utils.py", line 519, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Instinct\AppData\Local\Temp\_MEI38282\gradio\utils.py", line 512, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "anyio\to_thread.py", line 56, in run_sync
  File "anyio\_backends\_asyncio.py", line 2134, in run_sync_in_worker_thread
  File "anyio\_backends\_asyncio.py", line 851, in run
  File "C:\Users\Instinct\AppData\Local\Temp\_MEI38282\gradio\utils.py", line 495, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\Users\Instinct\AppData\Local\Temp\_MEI38282\gradio\utils.py", line 666, in gen_wrapper
    yield from f(*args, **kwargs)
  File "ui\txt2img_ui.py", line 194, in txt2img_inf
  File "apps\stable_diffusion\src\schedulers\sd_schedulers.py", line 141, in get_schedulers
  File "apps\stable_diffusion\src\schedulers\shark_eulerdiscrete.py", line 147, in compile
  File "apps\stable_diffusion\src\schedulers\shark_eulerdiscrete.py", line 123, in _import
  File "apps\stable_diffusion\src\utils\utils.py", line 187, in compile_through_fx
  File "apps\stable_diffusion\src\utils\utils.py", line 84, in _compile_module
  File "shark\shark_inference.py", line 232, in load_module
    params = load_flatbuffer(
             ^^^^^^^^^^^^^^^^
  File "shark\iree_utils\compile_utils.py", line 517, in load_flatbuffer
    vmfb, config, temp_file_to_unlink = load_vmfb_using_mmap(
                                        ^^^^^^^^^^^^^^^^^^^^^
  File "shark\iree_utils\compile_utils.py", line 448, in load_vmfb_using_mmap
    ctx.add_vm_module(mmaped_vmfb)
  File "iree\runtime\system_api.py", line 271, in add_vm_module
  File "iree\runtime\system_api.py", line 268, in add_vm_modules
RuntimeError: Error registering modules: C:\actions-runner\w\SRT\SRT\c\experimental\rocm\status_util.c:31: INTERNAL; rocm driver error 'hipErrorSharedObjectInitFailed' (303): shared object initialization failed; while invoking native function hal.executable.create; while calling import;**
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:284 -
Found device Radeon (TM) Pro WX 9100. Using target triple rdna3-unknown-windows.
Tuned models are currently not supported for this setting.
saving euler_scale_model_input_1_512_512_vulkan_fp16_torch_linalg.mlir to .\shark_tmp
loading existing vmfb from: E:\Shark\euler_scale_model_input_1_512_512_vulkan_fp16.vmfb
Loading module E:\Shark\euler_scale_model_input_1_512_512_vulkan_fp16.vmfb...
        Compiling Vulkan shaders. This may take a few minutes.
saving euler_step_epsilon_1_512_512_vulkan_fp16_torch_linalg.mlir to .\shark_tmp
loading existing vmfb from: E:\Shark\euler_step_epsilon_1_512_512_vulkan_fp16.vmfb
Loading module E:\Shark\euler_step_epsilon_1_512_512_vulkan_fp16.vmfb...
        Compiling Vulkan shaders. This may take a few minutes.
saving euler_a_scale_model_input_1_512_512_vulkan_fp16_torch_linalg.mlir to .\shark_tmp
loading existing vmfb from: E:\Shark\euler_a_scale_model_input_1_512_512_vulkan_fp16.vmfb
Loading module E:\Shark\euler_a_scale_model_input_1_512_512_vulkan_fp16.vmfb...
        Compiling Vulkan shaders. This may take a few minutes.
saving euler_a_step_epsilon_1_512_512_vulkan_fp16_torch_linalg.mlir to .\shark_tmp
loading existing vmfb from: E:\Shark\euler_a_step_epsilon_1_512_512_vulkan_fp16.vmfb
Loading module E:\Shark\euler_a_step_epsilon_1_512_512_vulkan_fp16.vmfb...
        Compiling Vulkan shaders. This may take a few minutes.
use_tuned? sharkify: False
Diffusers' checkpoint will be identified here :  E:/Shark/models/diffusers/dreamshaper_8
Loading diffusers' pipeline from original stable diffusion checkpoint
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
config.json: 100%|████████████████████████████████████████████████████████████████████████| 4.52k/4.52k [00:00<?, ?B/s]
huggingface_hub\file_download.py:149: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Instinct\.cache\huggingface\hub\models--openai--clip-vit-large-patch14. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
vocab.json: 100%|███████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 2.38MB/s]
merges.txt: 100%|███████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.82MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 389/389 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 905/905 [00:00<?, ?B/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████| 4.55k/4.55k [00:00<?, ?B/s]
huggingface_hub\file_download.py:149: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Instinct\.cache\huggingface\hub\models--CompVis--stable-diffusion-safety-checker. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.
pytorch_model.bin: 100%|██████████████████████████████████████████████████████████| 1.22G/1.22G [00:28<00:00, 42.6MB/s]
preprocessor_config.json: 100%|████████████████████████████████████████████████████████| 342/342 [00:00<00:00, 343kB/s]
transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
Loading complete
self.favored_base_models: ['stabilityai/stable-diffusion-2-1', 'CompVis/stable-diffusion-v1-4']
allowed_base_model_ids: ['stabilityai/stable-diffusion-2-1', 'CompVis/stable-diffusion-v1-4']
mat1 and mat2 shapes cannot be multiplied (128x1024 and 768x320)
Retrying with a different base model configuration, as stabilityai/stable-diffusion-2-1 did not work
torch\fx\node.py:272: UserWarning: Trying to prepend a node to itself. This behavior has no effect on the graph.
  warnings.warn("Trying to prepend a node to itself. This behavior has no effect on the graph.")
saving unet_1_64_512_512_fp16_dreamshaper_8_vulkan_torch_linalg.mlir to .\shark_tmp
No vmfb found. Compiling and saving to E:\Shark\unet_1_64_512_512_fp16_dreamshaper_8_vulkan.vmfb
Configuring for device:vulkan://00000000-0300-0000-0000-000000000000
Using target triple -iree-vulkan-target-triple=rdna3-unknown-windows from command line args
Saved vmfb in E:\Shark\unet_1_64_512_512_fp16_dreamshaper_8_vulkan.vmfb.
Loading module E:\Shark\unet_1_64_512_512_fp16_dreamshaper_8_vulkan.vmfb...
        Compiling Vulkan shaders. This may take a few minutes.
Legendrik commented 2 months ago

Hi, update: The PC was still stuck at compiling the vulcan shaders, I will try different versions and update you later!

Legendrik commented 2 months ago

Update: I tried a few different versions and the version "nod.ai SHARK 20231229.1091" works perfectly fine!