lshqqytiger / stable-diffusion-webui-amdgpu

Stable Diffusion web UI
GNU Affero General Public License v3.0
1.8k stars 186 forks source link

[Bug]: Latest update today seems to break image generation on Ryzen APUs? #112

Open risharde opened 1 year ago

risharde commented 1 year ago

Is there an existing issue for this?

What happened?

Updated the repo (forked) with the current commit: https://github.com/lshqqytiger/stable-diffusion-webui-directml/commit/36db52a5d22e3a89af9dddae765aaa7f068494bd

Select DDIM, steps 10 nor 12 produced a logical image. At 12, I got pixels. At 10, I got black image. This worked yesterday without the commit :(

Steps to reproduce the problem

Updated the repo (forked) with the current commit: https://github.com/lshqqytiger/stable-diffusion-webui-directml/commit/36db52a5d22e3a89af9dddae765aaa7f068494bd

Select DDIM, steps 10 nor 12 produced a logical image. At 12, I got pixels. At 10, I got black image. This worked yesterday without the commit :(

What should have happened?

Should have produced a valid output image

Commit where the problem happens

https://github.com/lshqqytiger/stable-diffusion-webui-directml/commit/36db52a5d22e3a89af9dddae765aaa7f068494bd

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

--listen --lowvram --precision full --no-half --opt-sub-quad-attention --opt-split-attention-v1 --disable-nan-check

List of extensions

Didn't select any (so built ins only)

Console logs

Running DDIM Sampling with 10 timesteps
DDIM Sampler:   0%|                                                                                                         | 0/10 [00:00<?, ?it/s]tensor(0.0140, device='privateuseone:0')
DDIM Sampler:  10%|█████████▋                                                                                       | 1/10 [00:05<00:50,  5.56s/it]tensor(0.0365, device='privateuseone:0')                                                                                     | 0/10 [00:00<?, ?it/s]
DDIM Sampler:  20%|███████████████████▍                                                                             | 2/10 [00:10<00:43,  5.40s/it]tensor(0.0819, device='privateuseone:0')█                                                                            | 2/10 [00:05<00:21,  2.65s/it]
DDIM Sampler:  30%|█████████████████████████████                                                                    | 3/10 [00:16<00:37,  5.33s/it]tensor(0.1598, device='privateuseone:0')██████████▌                                                                  | 3/10 [00:10<00:26,  3.73s/it]
DDIM Sampler:  40%|██████████████████████████████████████▊                                                          | 4/10 [00:21<00:31,  5.28s/it]tensor(0.2750, device='privateuseone:0')████████████████████                                                         | 4/10 [00:15<00:25,  4.28s/it]
DDIM Sampler:  50%|████████████████████████████████████████████████▌                                                | 5/10 [00:26<00:26,  5.28s/it]tensor(0.4229, device='privateuseone:0')█████████████████████████████▌                                               | 5/10 [00:21<00:23,  4.63s/it]
DDIM Sampler:  60%|██████████████████████████████████████████████████████████▏                                      | 6/10 [00:31<00:21,  5.27s/it]tensor(0.5888, device='privateuseone:0')███████████████████████████████████████                                      | 6/10 [00:26<00:19,  4.84s/it]
DDIM Sampler:  70%|███████████████████████████████████████████████████████████████████▉                             | 7/10 [00:37<00:15,  5.27s/it]tensor(0.7521, device='privateuseone:0')████████████████████████████████████████████████▌                            | 7/10 [00:31<00:14,  4.97s/it]
DDIM Sampler:  80%|█████████████████████████████████████████████████████████████████████████████▌                   | 8/10 [00:42<00:10,  5.25s/it]tensor(0.8930, device='privateuseone:0')██████████████████████████████████████████████████████████                   | 8/10 [00:36<00:10,  5.04s/it]
DDIM Sampler:  90%|███████████████████████████████████████████████████████████████████████████████████████▎         | 9/10 [00:47<00:05,  5.25s/it]tensor(0.9983, device='privateuseone:0')███████████████████████████████████████████████████████████████████▌         | 9/10 [00:41<00:05,  5.11s/it]
DDIM Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:52<00:00,  5.28s/it]

Additional information

No additional issues

risharde commented 1 year ago

I am using an AMD Ryzen 5600G APU

Reverting that specific commit allows me to generate an image on DDIM with steps = 10.

Here is the what the bootup output looks like that works for me:

venv "C:\ai\stable-diffusion-webui-directml\venv\Scripts\Python.exe" Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Commit hash: c36f0a57eb2464eaa790540584a968bed4b47ebf Installing requirements Launching Web UI with arguments: --listen --lowvram --precision full --no-half --opt-sub-quad-attention --opt-split-attention-v1 --disable-nan-check No module 'xformers'. Proceeding without it. Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled Loading weights [6ce0161689] from C:\ai\stable-diffusion-webui-directml\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors Creating model from config: C:\ai\stable-diffusion-webui-directml\configs\v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Applying sub-quadratic cross attention optimization. Textual inversion embeddings loaded(0): Model loaded in 6.3s (load weights from disk: 0.3s, create model: 0.4s, apply weights to model: 5.4s, load VAE: 0.2s). Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch(). Startup time: 18.0s (import torch: 2.0s, import gradio: 1.2s, import ldm: 0.5s, other imports: 1.2s, load scripts: 1.0s, load SD checkpoint: 6.9s, create ui: 0.8s, gradio launch: 4.3s).

DivanoDova commented 1 year ago

I have a 5600G with a RX6600 and DDIM never worked for me.

risharde commented 1 year ago

@Ptibouc77 I know very little about how all of this works but we're not comparing apples to apples here since your setup is superior with the dedicated GPU (so the issue for you might be different from the issue I am facing unfortunately). You might need to create a separate issue since your issue sounds like an incompatibility between your RX6600 and DDIM method (?)

It seems in my case, because I'm using solely the 5600G integrated graphics cores, the new update which adds 'torch-directml' to torch seems to break all rendering for the 5600G.

If you're curious to see what your 5600G can do without the dedicated RX6600, I ended up creating a branch and removed the latest commit - which works for me, I get a single 512px by 512px in about 1.5 minutes (which is slow but still MUCH faster than CPU only method which took more than 15 minutes maybe more)

https://github.com/risharde/stable-diffusion-webui-directml/tree/master-without-36db52a

risharde commented 1 year ago

@lshqqytiger tagging you since this seems critical to keep Ryzen APUs working

lshqqytiger commented 1 year ago

I can't reproduce it with my RX 5700 XT. Does this issue occur only on APUs or/and with specific commandline arguments?

risharde commented 1 year ago

@lshqqytiger Cannot say for certain since I don't have a dedicated GPU to test it on but I suspect this is APU related

risharde commented 1 year ago

@lshqqytiger following up on the arguments question - it seems unspecific to the arguments - the only thing I did was update to the latest commit mentioned above. Hope this helps, looking to keep up to date with your repo if this can be fixed but I'm not experienced at all with AI code and the automation code to do it myself

lshqqytiger commented 1 year ago

I didn't do that much on commit 36db52a5d22e3a89af9dddae765aaa7f068494bd and changes does not affect on any generation process..

risharde commented 1 year ago

I'm awaiting an upgrade for this specific system (I needed to take 8GB from it for another system temporarily). I did test the latest branch on this machine 5600G with 8GB of RAM and it's dramatically slow (53sec per iteration) as opposed to 5s per transaction. This of course is not an apples for apples comparison.

So in the meanwhile, I'm going a step lower on the processor side to look IF the branch will perform on a laptop with 16GB RAM and a mobile Ryzen 4500U processor. I'll try to post results on this for your branch and my branch where I removed the commit.

Tdawg069 commented 1 year ago

@risharde I can confirm that the latest version of this code works for my Ryzen 5 3400G APU + 16GB RAM setup (no d-gpu). SD1.5 model takes about 7sec/iteration (prompt only, 512*512). SDXL+Refiner model takes about 80sec/iteration. 🙈

Are you still using the APU setup?