RuntimeError: Error llama2_7b_int4_vulkan_rdna2_unknown.vmfb

Error occurs when using model "llama2_7b => meta-llama/Llama-2-7b-chat-hf" with device "AMD Radeon RX 6600M => vulkan://0"

`Configuring for device:vulkan

Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args

Saved vmfb in C:\Users\rodri.cache\llama2_7b_int4_vulkan_rdna2_unknown.vmfb.

Saved vic vmfb at C:\Users\rodri.cache\llama2_7b_int4_vulkan_rdna2_unknown.vmfb

WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.

Loading module C:\Users\rodri.cache\llama2_7b_int4_vulkan_rdna2_unknown.vmfb...

    Compiling Vulkan shaders. This may take a few minutes.

::: Detailed report (took longer than 2.5s):

+18.04518699645996ms: Mapping device id: 0

+26.04389190673828ms: ireert.get_driver()

+130.61070442199707ms: ireert.create_device()

+134.61041450500488ms: ireert.Config()

+150.99406242370605ms: mmap C:\Users\rodri.cache\llama2_7b_int4_vulkan_rdna2_unknown.vmfb

+154.99424934387207ms: ireert.SystemContext created

+2805.0849437713623ms: module initialized

apps\language_models\scripts\vicuna.py:441: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor).

::: Detailed report (took longer than 5.0s):

+2.000570297241211ms: Load to device: torch.Size([1, 1])

+3.9997100830078125ms: Load to device: [1, 32, 24, 128]

+3.9997100830078125ms: Invoke function: second_vicuna_forward

Traceback (most recent call last):

File "gradio\queueing.py", line 388, in call_prediction

File "gradio\route_utils.py", line 219, in call_process_api

File "gradio\blocks.py", line 1437, in process_api

File "gradio\blocks.py", line 1123, in call_function

File "gradio\utils.py", line 503, in async_iteration

File "gradio\utils.py", line 496, in anext

File "anyio\to_thread.py", line 33, in run_sync

File "anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread

File "anyio_backends_asyncio.py", line 807, in run

File "gradio\utils.py", line 479, in run_sync_iterator_async

File "gradio\utils.py", line 629, in gen_wrapper

File "ui\stablelm_ui.py", line 252, in chat

File "gradio\helpers.py", line 528, in next

File "apps\language_models\scripts\vicuna.py", line 1817, in generate

File "apps\language_models\scripts\vicuna.py", line 443, in generate_new_token

File "shark\shark_inference.py", line 150, in call

return self.shark_runner.run(function_name, inputs, send_to_host)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "shark\shark_runner.py", line 110, in run

return get_results(

       ^^^^^^^^^^^^

File "shark\iree_utils\compile_utils.py", line 557, in get_results

result = compiled_vm[function_name](*device_inputs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "iree\runtime\function.py", line 137, in call

File "iree\runtime\function.py", line 162, in _invoke

RuntimeError: Error invoking function: C:\actions-runner\w\SRT\SRT\c\runtime\src\iree\hal\drivers\vulkan\native_semaphore.cc:134: RESOURCE_EXHAUSTED; overflowed timeline semaphore max value; while invoking native function hal.fence.await; while calling import;

[ 1] native hal.fence.await:0 -

[ 0] bytecode module@1:102460 -