[Bug]: Failed to load ZLUDA

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

After the last updates to the repositoy I'm unable to use ZLUDA with AMD Radeon 7800 XT (was working some days ago). When starting, the following is displayed:

Failed to load ZLUDA: list index out of range
Using CPU-only torch

Then, when trying to generate anything:

ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7800 XT [ZLUDA]

Tried both with my existing install and a new one. DirectML seems to work in a separate install.

Steps to reproduce the problem

1 - Go to txt2img tab 2 - Type any prompt and click on generate

What should have happened?

Geraration should have worked using ZLUDA.

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

sysinfo-2024-09-24-13-42.json

Console logs

venv "D:\projects\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-9-g46397d07
Commit hash: 46397d078cff4547eb4bd87adc5c56283e2a8d20
Using ZLUDA in D:\projects\stable-diffusion-webui-amdgpu\.zluda
Failed to load ZLUDA: list index out of range
Using CPU-only torch
Skipping onnxruntime installation.
You are up to date with the most recent release.
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --update-check --skip-ort --disable-model-loading-ram-optimization --no-half --update-all-extensions --disable-nan-check --models-dir 'D:\projects\models'
ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7800 XT [ZLUDA]
CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Loading weights [68caded5ad] from D:\projects\models\Stable-diffusion\aZovyaPhotoreal_v2InpaintVAE.safetensors
Creating model from config: D:\projects\stable-diffusion-webui-amdgpu\configs\v1-inpainting-inference.yaml
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Startup time: 9.2s (prepare environment: 11.5s, initialize shared: 0.8s, other imports: 0.3s, list SD models: 0.2s, load scripts: 0.5s, create ui: 0.6s, gradio launch: 0.4s).
creating model quickly: OSError
Traceback (most recent call last):
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 406, in hf_raise_for_status
    response.raise_for_status()
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\requests\models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\utils\hub.py", line 402, in cached_file
    resolved_file = hf_hub_download(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1232, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1339, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1854, in _raise_on_head_call_error
    raise head_call_error
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1746, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1666, in get_hf_file_metadata
    r = _request_wrapper(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 364, in _request_wrapper
    response = _request_wrapper(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 388, in _request_wrapper
    hf_raise_for_status(response)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 454, in hf_raise_for_status
    raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-66f2c19e-5f8817675bb7da144de57d88;221a2d43-d68f-4d6e-81e5-7585b5f02c0e)

Repository Not Found for url: https://huggingface.co/None/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\initialize.py", line 149, in load_model
    shared.sd_model  # noqa: B018
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\shared_items.py", line 190, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 693, in get_sd_model
    load_model()
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 831, in load_model
    sd_model = instantiate_from_config(sd_config.model, state_dict)
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 775, in instantiate_from_config
    return constructor(**params)
  File "D:\projects\stable-diffusion-webui-amdgpu\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1650, in __init__
    super().__init__(concat_keys, *args, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1515, in __init__
    super().__init__(*args, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 563, in __init__
    self.instantiate_cond_stage(cond_stage_config)
  File "D:\projects\stable-diffusion-webui-amdgpu\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 630, in instantiate_cond_stage
    model = instantiate_from_config(config)
  File "D:\projects\stable-diffusion-webui-amdgpu\repositories\stable-diffusion-stability-ai\ldm\util.py", line 89, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "D:\projects\stable-diffusion-webui-amdgpu\repositories\stable-diffusion-stability-ai\ldm\modules\encoders\modules.py", line 104, in __init__
    self.transformer = CLIPTextModel.from_pretrained(version)
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\sd_disable_initialization.py", line 68, in CLIPTextModel_from_pretrained
    res = self.CLIPTextModel_from_pretrained(None, *model_args, config=pretrained_model_name_or_path, state_dict={}, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\modeling_utils.py", line 3247, in from_pretrained
    resolved_config_file = cached_file(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\utils\hub.py", line 425, in cached_file
    raise EnvironmentError(
OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

Failed to create model quickly; will retry using slow method.
Applying attention optimization: InvokeAI... done.
loading stable diffusion model: RuntimeError
Traceback (most recent call last):
  File "C:\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\initialize.py", line 149, in load_model
    shared.sd_model  # noqa: B018
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\shared_items.py", line 190, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 693, in get_sd_model
    load_model()
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 871, in load_model
    sd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings(force_reload=True)  # Reload embeddings after model load as they may or may not fit the model
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\textual_inversion\textual_inversion.py", line 228, in load_textual_inversion_embeddings
    self.expected_shape = self.get_expected_shape()
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\textual_inversion\textual_inversion.py", line 156, in get_expected_shape
    vec = shared.sd_model.cond_stage_model.encode_embedding_init_text(",", 1)
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\sd_hijack_clip.py", line 365, in encode_embedding_init_text
    embedded = embedding_layer.token_embedding.wrapped(ids.to(embedding_layer.token_embedding.wrapped.weight.device)).squeeze(0)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\sparse.py", line 163, in forward
    return F.embedding(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Stable diffusion model failed to load
Exception in thread MemMon:
Traceback (most recent call last):
  File "C:\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\memmon.py", line 43, in run
    torch.cuda.reset_peak_memory_stats()
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
    return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
Using already loaded model aZovyaPhotoreal_v2InpaintVAE.safetensors [68caded5ad]: done in 0.0s
*** Error completing request
*** Arguments: ('task(l5ca323d0w28eju)', <gradio.routes.Request object at 0x000000006889C8B0>, 'kitten', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'DPM++ 2M', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 74, in f
        res = list(func(*args, **kwargs))
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 53, in f
        res = func(*args, **kwargs)
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 37, in f
        res = func(*args, **kwargs)
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\processing.py", line 849, in process_images
        res = process_images_inner(p)
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\processing.py", line 1007, in process_images_inner
        model_hijack.embedding_db.load_textual_inversion_embeddings()
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\textual_inversion\textual_inversion.py", line 228, in load_textual_inversion_embeddings
        self.expected_shape = self.get_expected_shape()
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\textual_inversion\textual_inversion.py", line 156, in get_expected_shape
        vec = shared.sd_model.cond_stage_model.encode_embedding_init_text(",", 1)
      File "D:\projects\stable-diffusion-webui-amdgpu\modules\sd_hijack_clip.py", line 365, in encode_embedding_init_text
        embedded = embedding_layer.token_embedding.wrapped(ids.to(embedding_layer.token_embedding.wrapped.weight.device)).squeeze(0)
      File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\sparse.py", line 163, in forward
        return F.embedding(
      File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\functional.py", line 2264, in embedding
        return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
    RuntimeError: CUDA error: invalid argument
    CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

---
Traceback (most recent call last):
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 104, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
    return self.read()
  File "D:\projects\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
    torch_stats = torch.cuda.memory_stats(self.device)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "D:\projects\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
    return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated

Additional information

GPU driver was not updated recently, but is on the most recent version.
HIP SDK is on version 6.1.0.
ZLUDA is the most recent versoin from https://github.com/lshqqytiger/ZLUDA.

lshqqytiger / stable-diffusion-webui-amdgpu