lshqqytiger / stable-diffusion-webui-amdgpu

Stable Diffusion web UI
GNU Affero General Public License v3.0
1.87k stars 191 forks source link

[Bug]: BSOD on Ryzen with directml if memory overflows 16 GB #557

Open FuhrerStein opened 2 weeks ago

FuhrerStein commented 2 weeks ago

Checklist

What happened?

Generation on AMD Ryzen 3 5300G with directml leads to BSOD every time Shared video memory usage exceeds 16GB limit.

Steps to reproduce the problem

Generate a batch of images with directml on Ryzen iGPU

What should have happened?

System should not restart with Blue Screen Of Death message.

What browsers do you use to access the UI ?

Other

Sysinfo

sysinfo-2024-11-02-08-32.json

Console logs

venv "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-15-gcf6c4e97
Commit hash: cf6c4e9765abe987e68a94006cf61672a076042c
C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-directml --lowvram --opt-sub-quad-attention --upcast-sampling --disable-nan-check --no-half-vae --precision autocast
ONNX: version=1.19.2 provider=DmlExecutionProvider, available=['DmlExecutionProvider', 'CPUExecutionProvider']
Loading weights [396c5c633f] from C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\models\Stable-diffusion\CrowPonyQuadpipe_ponyV2.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Creating model from config: C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
Startup time: 11.6s (prepare environment: 16.6s, initialize shared: 1.2s, load scripts: 0.7s, create ui: 0.5s, gradio launch: 0.6s).
creating model quickly: OSError
Traceback (most recent call last):
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 406, in hf_raise_for_status
    response.raise_for_status()
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\requests\models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\utils\hub.py", line 403, in cached_file
    resolved_file = hf_hub_download(
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 862, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 969, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1484, in _raise_on_head_call_error
    raise head_call_error
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1376, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1296, in get_hf_file_metadata
    r = _request_wrapper(
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 277, in _request_wrapper
    response = _request_wrapper(
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 301, in _request_wrapper
    hf_raise_for_status(response)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 454, in hf_raise_for_status
    raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6725e20b-5b69debb7fa82dae16049132;abbc8546-1b83-4572-9e56-6395cccf3eee)

Repository Not Found for url: https://huggingface.co/None/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\fuhrer\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\fuhrer\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\fuhrer\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\modules\initialize.py", line 149, in load_model
    shared.sd_model  # noqa: B018
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\modules\shared_items.py", line 190, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 693, in get_sd_model
    load_model()
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 831, in load_model
    sd_model = instantiate_from_config(sd_config.model, state_dict)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 775, in instantiate_from_config
    return constructor(**params)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\diffusion.py", line 61, in __init__
    self.conditioner = instantiate_from_config(
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\util.py", line 175, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\encoders\modules.py", line 88, in __init__
    embedder = instantiate_from_config(embconfig)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\util.py", line 175, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\encoders\modules.py", line 361, in __init__
    self.transformer = CLIPTextModel.from_pretrained(version)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\modules\sd_disable_initialization.py", line 68, in CLIPTextModel_from_pretrained
    res = self.CLIPTextModel_from_pretrained(None, *model_args, config=pretrained_model_name_or_path, state_dict={}, **kwargs)
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\modeling_utils.py", line 3505, in from_pretrained
    resolved_config_file = cached_file(
  File "C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\utils\hub.py", line 426, in cached_file
    raise EnvironmentError(
OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

Failed to create model quickly; will retry using slow method.
C:\Games\__All_SD_data\sd_DML\stable-diffusion-webui-amdgpu\modules\safe.py:156: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return unsafe_torch_load(filename, *args, **kwargs)
Applying attention optimization: sub-quadratic... done.
Model loaded in 12.2s (load weights from disk: 0.7s, create model: 7.3s, apply weights to model: 2.7s, apply half(): 0.2s, calculate empty prompt: 1.3s).
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [01:39<00:00,  9.93s/it]
Total: 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [01:37<00:00,  9.73s/it]
Total: 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [01:37<00:00,  8.61s/it]

Additional information

I use webui on AMD Ryzen 3 5300G, it has Radeon Vega iGPU. Windows 11, latest video driver.

In my case I've been able to run only with --use-directml option, no ONNX. I use --lowvram and other optimizations, but even then, depending on image size, memory consumption can be over 12-14 GB (for image sizes around 1 megapixel).

Also, there seems to be a memory leak. Shared video memory (aka dynamic) does not get freed after every generation. So, the only way to avoid huge memory consumption is to restart the whole script. On top of all that, I get BSOD every time memory consumption exceeds 16GB limit.

So, my only option is to restart batch file every time there is a chance that the next generation will overflow memory usage. I suppose it's a problem for every Windows Ryzen user, not just me.

This can be considered both as bug report and as a Feature Request. I'll describe possible solutions from ideal one to the simplest in implementation. I don't have those solutions, only the simplest one.

  1. Solve memory leak. Make dynamic video memory free after every step in generation. I have no idea how to do that, but that would be ideal solution.
  2. Free memory in some way after every generated image. This would mean that memory usage is still accumulated between layers, but once an image is generated, script somehow frees memory before generating next image. Maybe, that can be done with some partial reinitialization, idk.
  3. Free memory by restarting the whole script in case memory usage gets too high. Do that after all images in a task have finished generating. That does not prevent bsod as such, but makes it possible to create a stable workflow without manual reinitialization of a script after every image.

For now, my solution is a bad version of variant 3. It restarts script if memory usage exceeds the threshold given by the user. However, while doing so, it fails to send last generated data to the gui. So, image gets saved, and memory gets free, but gui shows confusing data. Also, this method fails to protect memory when user generates in batches, as it runs only after all generations are finished. Still, with "Generate forever" gui function, my method allows to run many hours of image generation without interruptions. And with how slow Ryzed GPU is, this is very much needed.

My solution adds 6 lines into call_queue.py: 2 lines in the beginning and 4 in wrap_gradio_call_no_job. Here are they:

from modules.restart import restart_program
MAX_MEMORY_LIMIT_MB = 10000  # Define your memory threshold in MB

            # Check if memory usage exceeds the threshold
            if sys_peak > MAX_MEMORY_LIMIT_MB:
                print("Memory threshold exceeded. Terminating process to prevent crash.")
                restart_program()

What I ask is for a better solution, that would:

  1. Update web page before restarting script.
  2. Make more frequent checks, not in the end of everything.
  3. Ideally, free memory without whole script restarts.
lshqqytiger commented 2 weeks ago
  1. DirectML does not return its allocated memory space once it is allocated until the process is terminated. That means freeing is not possible on our hands. (DirectML itself controls allocated memory internally, actually it reuses)
  2. forge has a similar feature to setting memory limitation. It may help you.
FuhrerStein commented 2 weeks ago
  1. DirectML does not return its allocated memory space once it is allocated until the process is terminated.

In other words, the only way to free memory is to restart whole process. Am I right? Maybe you'll suggest me where to move my restart call so that it occurs after script sends last generated image to browser? I suppose it would be possible to insert it somwhere at the end of anyio queue, but I was not able to figure out how that works.

  1. forge has a similar feature to setting memory limitation. It may help you.

Correct me if I'm wrong, but it seems to not support directml as torch backend, am I right? I have Low-end Ryzen, so zluda/Rocm under Windows seems to be out of the options for me. At least stable-diffusion-webui-amdgpu I was able to run only with --use-directml.

lshqqytiger commented 2 weeks ago

In other words, the only way to free memory is to restart whole process. Am I right?

Yes.

Maybe you'll suggest me where to move my restart call so that it occurs after script sends last generated image to browser? I suppose it would be possible to insert it somwhere at the end of anyio queue, but I was not able to figure out how that works.

Strictly speaking, restarting process is not possible. You may manually restart webui via terminal. Or, there should be another (master) process that saves states and restarts (kill and spawn) python.

Correct me if I'm wrong, but it seems to not support directml as torch backend, am I right? I have Low-end Ryzen, so zluda/Rocm under Windows seems to be out of the options for me. At least stable-diffusion-webui-amdgpu I was able to run only with --use-directml.

Forge supports DirectML. It is --directml while --use-directml in stable-diffusion-webui-amdgpu. There is stable-diffusion-webui-amdgpu-forge too, which is a merge of webui-amdgpu and webui-forge.

FuhrerStein commented 5 days ago

You may manually restart webui via terminal. Or, there should be another (master) process that saves states and restarts (kill and spawn) python.

That process is webui.bat file that starts python script. That's why my solution works - it allows me to generate hundreds of images on unattended pc, although without this modification I get bsod on second or third image without restart. The modification I propose in my first message does restart with freeing memory and it prevents errors caused by memory overuse. But it has some issues, hence my question.

There is stable-diffusion-webui-amdgpu-forge

Didn't know it exist, thanks. After a lot of trial and error, I was able to run it on my system with every possible tweak to use less memory. What's strange, stable-diffusion-webui-amdgpu-forge does use more than 16G of shared memory without crash, although I still caught crashes a few times. However, main issue with your fork of Forge was that it uses way too much all memory for my system. I peaked at 40 GB of system memory usage (half of which was shared) without even generating 1 Mpx image. At the same time, my modified version of your stable-diffusion-webui-amdgpu allows me to generate 1,15 megapixel image without crash or memory overflow.