venv "C:\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3-amd-13-g517aaaff
Commit hash: 517aaaff2bb1a512057d88b0284193b9f23c0b47
Installing torch and torchvision
Requirement already satisfied: torch==2.0.0 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (2.0.0)
Requirement already satisfied: torchvision==0.15.1 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (0.15.1)
Requirement already satisfied: torch-directml in C:\stable-diffusion-webui-directml\venv\lib\site-packages (0.2.0.dev230426)
Requirement already satisfied: jinja2 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.0.0) (3.1.4)
Requirement already satisfied: filelock in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.0.0) (3.14.0)
Requirement already satisfied: sympy in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.0.0) (1.12)
Requirement already satisfied: networkx in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.0.0) (3.3)
Requirement already satisfied: typing-extensions in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.0.0) (4.11.0)
Requirement already satisfied: numpy in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from torchvision==0.15.1) (1.26.2)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from torchvision==0.15.1) (9.5.0)
Requirement already satisfied: requests in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from torchvision==0.15.1) (2.31.0)
Requirement already satisfied: MarkupSafe>=2.0 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from jinja2->torch==2.0.0) (2.1.5)
Requirement already satisfied: certifi>=2017.4.17 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from requests->torchvision==0.15.1) (2024.2.2)
Requirement already satisfied: charset-normalizer<4,>=2 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from requests->torchvision==0.15.1) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from requests->torchvision==0.15.1) (2.2.1)
Requirement already satisfied: idna<4,>=2.5 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from requests->torchvision==0.15.1) (3.7)
Requirement already satisfied: mpmath>=0.19 in C:\stable-diffusion-webui-directml\venv\lib\site-packages (from sympy->torch==2.0.0) (1.3.0)
[notice] A new release of pip available: 22.2.1 -> 24.0
[notice] To update, run: C:\stable-diffusion-webui-directml\venv\Scripts\python.exe -m pip install --upgrade pip
You are up to date with the most recent release.
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
Launching Web UI with arguments: --use-directml --update-all-extensions --opt-sub-quad-attention --opt-split-attention --no-half --upcast-sampling --update-check --reinstall-torch
ONNX: version=1.18.0 provider=DmlExecutionProvider, available=['DmlExecutionProvider', 'CPUExecutionProvider']
==============================================================================
You are running torch 2.0.0+cpu.
The program is tested to work with torch 2.1.2.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.
Use --skip-version-check commandline argument to disable this check.
==============================================================================
Loading weights [6ce0161689] from C:\stable-diffusion-webui-directml\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: C:\stable-diffusion-webui-directml\configs\v1-inference.yaml
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Startup time: 10.3s (prepare environment: 12.1s, initialize shared: 1.2s, load scripts: 1.2s, create ui: 0.4s, gradio launch: 0.3s).
Applying attention optimization: Doggettx... done.
Model loaded in 3.2s (load weights from disk: 0.5s, create model: 0.3s, apply weights to model: 2.2s).
Training at rate of 0.005 until step 100000
Preparing dataset...
100%|██████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00, 5.67it/s]
0%| | 0/100000 [00:00<?, ?it/s]*** Error training embedding
Traceback (most recent call last):
File "C:\stable-diffusion-webui-directml\modules\textual_inversion\textual_inversion.py", line 553, in train_embedding
scaler.scale(loss).backward()
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\_tensor.py", line 487, in backward
torch.autograd.backward(
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\autograd\__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.
---
Applying attention optimization: Doggettx... done.
EDIT: I just realized this may be important - normal image generation functionality is OK (i.e txt2image and img2img), I only experience problems with training.
Checklist
What happened?
When I launch any TI training, it fails immediately after dataset preparation with the error shown in the logs.
Steps to reproduce the problem
What should have happened?
Training should begin
What browsers do you use to access the UI ?
Mozilla Firefox
Sysinfo
sysinfo-2024-05-19-22-23.json
Console logs
Additional information
I see similar reports for other parts of the UI, like this one https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/issues/71#issue-1661243199. Some reports say that the issue is caused when too much VRAM is allocated but I don't think that's the case here. Opening this separate ticket for training specifically.
EDIT: I just realized this may be important - normal image generation functionality is OK (i.e txt2image and img2img), I only experience problems with training.