lshqqytiger / stable-diffusion-webui-amdgpu

Stable Diffusion web UI
GNU Affero General Public License v3.0
1.75k stars 179 forks source link

[Bug]: Starts to generate incoherent images #109

Open popsoda27 opened 1 year ago

popsoda27 commented 1 year ago

Is there an existing issue for this?

What happened?

Had good experience generating proper images for around 30+ images but then it starts generating incoherent images all of a sudden like below, just a mash of colours

00010-2622515961-(best quality, masterpiece, intricate details_1 4), 1 girl, (korean mixed, kpop idol_1 2),(ulzzang-6500_0 7), pale skin,looking

Steps to reproduce the problem

  1. Enter prompts taken from examples shared on Civitai
  2. Configure generation parameters. Using DPM++ SDE Karras, 30 steps, CFG 7, size 512x768, batch count and size 1.
  3. Click generate and wait
  4. No errors shown on console, no bad references to loras or embeddings. Image generated to 100%.

What should have happened?

Proper image to somewhat similar to example

Commit where the problem happens

3284ccc091f09146997fd93bb88a2ecd27ab3a1b

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

--no-half --no-half-vae --precision full --disable-nan-check --opt-sub-quad-attention --api --autolaunch

List of extensions

a1111-sd-webui-lycoris openpose-editor sd-webui-infinite-image-browsing sd-webui-model-converter sd_civitai_extension ultimate-upscale-for-automatic1111

Console logs

venv "E:\Downloads\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 3284ccc091f09146997fd93bb88a2ecd27ab3a1b
Installing requirements

#######################################################################################################
Initializing Civitai Link
If submitting an issue on github, please provide the below text for debugging purposes:

Python revision: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Civitai Link revision: d0e83e7b048f9a6472b4964fa530f8da754aba58
SD-WebUI revision: 3284ccc091f09146997fd93bb88a2ecd27ab3a1b

Checking Civitai Link requirements...
[!] python-socketio[client] version 5.7.2 NOT installed.

#######################################################################################################

Launching Web UI with arguments: --no-half --no-half-vae --precision full --disable-nan-check --opt-sub-quad-attention --api --autolaunch
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
Civitai: API loaded
Loading weights [fc2511737a] from E:\Downloads\stable-diffusion-webui-directml\models\Stable-diffusion\chilloutmix_NiPrunedFp32Fix.safetensors
Creating model from config: E:\Downloads\stable-diffusion-webui-directml\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: E:\Downloads\stable-diffusion-webui-directml\models\VAE\vae-ft-mse-840000-ema-pruned.safetensors
Applying sub-quadratic cross attention optimization.
Textual inversion embeddings loaded(6): badhandv4, dangerhawk, easynegative, ng_deepnegative_v1_75t, realisticvision-negative-embedding, ulzzang-6500-v1.1
Model loaded in 6.9s (load weights from disk: 0.2s, create model: 0.5s, apply weights to model: 0.6s, load VAE: 0.2s, move model to device: 5.4s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Civitai: Check resources for missing preview images
Startup time: 17.7s (import torch: 2.0s, import gradio: 1.0s, import ldm: 0.5s, other imports: 1.0s, load scripts: 2.1s, load SD checkpoint: 7.3s, opts onchange: 0.5s, create ui: 1.3s, gradio launch: 1.8s, scripts app_started_callback: 0.1s).
Civitai: Found 2 resources missing preview images
Civitai: No preview images found on Civitai
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:50<00:00,  1.68s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:44<00:00,  1.49s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:44<00:00,  1.45s/it]

Additional information

python: 3.10.6  •  torch: 2.0.0+cpu  •  xformers: N/A  •  gradio: 3.28.1  •  commit: 3284ccc0  •  checkpoint: fc2511737a

AMD 6700XT Win 11 Pro

Dead2 commented 1 year ago

On my 7900XTX with 24GB vram, I get the same kinds of images after vram usage reaches ~17.5GB (so right around 75%).

That is also the point where generating images slows way down, it goes from 4-5sec per 512x512 20step image (just default settings) to about 80sec, and the output is just garbage, once in a while it does have hint of an actual image (maybe because it worked for the first part and then broke midway through the process and gradually removed the image again).

Interestingly this happens very fast with DirectML.dll 10.1 (currently installed by default), so on first or second image after starting the program , but manually changing this to the DirectML.dll 11.0 it manages about 4-5 images before reaching the same problem. (Possibly 11.0 uses/leaks a bit less memory, thus just postponing the problem?)

Vram usage keeps increasing after 17.5GB, but at a much lower pace. Perhaps memory usage is unrelated to the real problem though, and just goes up more slowly due to everything slowing way down. But it is the only indicator I have found that reliably predicts the slowdown and incorrect output.

The GPU keeps running at an impressive ~3000Mhz (but with a little lower indicated load), so the slowdown is not thermal/clockspeed related, not that that would explain the faulty output in any case. CPU usage seems to fall from ~3 cores to ~1.5-2 cores, so it seems to be working hard, somewhere.

My first thought was that it is some kind of memory pressure issue triggering memory reclaim possibly freeing/overwriting the wrong areas, but I am not at all sure about that. But there does seem to be a memory leak somewhere, as I feel pretty sure that it should not normally eat up 17.5GB after making just 1-2 images. I tried using a 16-bit 2GB model instead, and it didn't change anything.

I think the Radeon driver version might also affect this, I am on the currently latest release; 23.4.3. Windows version should not matter much since we install a directml.dll directly instead of relying on the older OS-provided one, but I am on Windows 10 21H1 19043.

I am not sure what more I can look for or test.

@popsoda27 Does your problem follow a similar pattern of always starting to generate bad images after hitting a certain amount of gpu vram used? (Not normal RAM, I feel pretty sure that you know the difference, just making sure). It would also be interesting to know whether it slows down when generating bad images compared to the earlier good images.

PennyFranklin commented 1 year ago

my 7900xtx also run into this error, uses vram too much at first picture generation

lshqqytiger commented 1 year ago

It is quite strange and interesting. If this problem occurs due to high vram occupation, limiting it might help. But unfortunately I can't test and reproduce it because I don't have that expensive video card..

Dead2 commented 1 year ago

7900XTX with 24GB of vram Windows 10, Radeon 23.4.3, DirectML 1.10.1 Sampler: Euler a Prompt: house with impossible features, no negative prompt Using SD v1.5 model downscaled to 16bit, resulting in a lightweight 2GB model, compared to the full SD v1.5 in these tests it seems to make the slowdown appear faster.

Defaults

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00,  2.58it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:24<00:00,  1.20s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:40<00:00,  2.02s/it]

Slows down at the end of first image as it passed 17.5GB / 75% vram usage No images corrupted

Command line: --medvram

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:06<00:00,  2.87it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.63it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.64it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.66it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.68it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.62it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.68it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.68it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:38<00:00,  1.90s/it]

10th picture passed 17.5GB / 75% vram usage No images corrupted

Command line: --medvram --always-batch-cond-uncond

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:09<00:00,  2.15it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:35<00:00,  1.80s/it]

Slows down at picture 2, as it passed 17.3GB / 72% vram usage No images corrupted

Command line: --lowvram --always-batch-cond-uncond

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:26<00:00,  1.31s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:25<00:00,  1.26s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:25<00:00,  1.26s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:26<00:00,  1.30s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:26<00:00,  1.30s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:25<00:00,  1.28s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:26<00:00,  1.31s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:26<00:00,  1.33s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:26<00:00,  1.33s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:26<00:00,  1.34s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:27<00:00,  1.35s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:56<00:00,  2.82s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:56<00:00,  2.85s/it]

Slows down at picture 12, as it passed 17.3GB / 72% vram usage No images corrupted

Command line --medvram --no-half --no-half-vae --precision=full

These parameters SOMEHOW stops the memory usage from growing after reaching 16.6GB / 69%, and the big slowdown does not happen (vram usage stayed right at 16.6GB from image ~10 until 60+), and that seems to confirm that there is indeed some kind of problem happening when we get above 70% vram usage.. I am not sure how that problem sometimes causes broken images, and sometimes not, but it does seem related to me.

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00,  2.54it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.71it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.76it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.68it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.77it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.75it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.71it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.73it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.74it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.47it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.49it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.48it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.50it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.49it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.46it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.47it/s]

See how it slowed down just a little after image 9? That is when it reached 16.6GB / 69% and stopped increasing further. I think this is how the memory limiter might be supposed to work, and in the other examples above it somehow does something wrong. Possibly it also sometimes frees the wrong data from memory, causing the broken images problem. Is this related to conversions to half precision somehow? My results seem to suggest it might be.

Other tests

Tested DDIM, pretty much no changes. Tested an undocumented setting --use-cpu=vae that I saw suggested somewhere, this made the slowdown appear at 16.8GB / 70% vram usage. Tested --no-half --no-half-vae --precision=full This is the fastest so far, at 5.16it/s, but it still slows down badly at 17.2GB

Summary

Did some version get upgraded from yesterday? Because something seems to have changed. The slowdown is exactly the same as before, but now I can't reproduce the image corruption anymore. It is unfortunate that there is no log about what gets updated when you run the start script, so I don't know whether anything was updated or something else changed. Looking at the venv lib folders and various other folders, I don't see any timestamps newer than my previous test round. Machine was also not rebooted, and there were no windows updates or driver updates since my last round of tests. Maybe some other interaction is to blame here? Browser tab with a video open or somesuch? (Insert moon phase theories here)

There does seem to be some kind of limiter around 69-75% vram usage, where everything slows down, I don't know where or what it does as I am not at all familiar with this code.

I recommend testing --medvram --no-half --no-half-vae --precision=full to see whether this works around the slowdown and possibly the image corruption for others as well.

Dead2 commented 1 year ago

I did some quick followup tests with the DirectML.dll v 1.11.0 using parameters --medvram --no-half --no-half-vae --precision=full, and it reduces the rate of memory usage increase up to 16.6GB. So compared to the last test above where it slowed down slightly after image 9, it now goes until image 20, and the slight slowdown is slightly better.

Dead2 commented 1 year ago

Some more observations:

I managed to trigger corrupted images once, but I did not manage to repeat it. Generating a dozen images and then increasing to batch-size 2 worked for a few batches of images, but increases vram usage again. Once vram usage reached 17.5GB I got a huge slowdown and broken images again. For 8 batches, the first 7 batches (14 images) were fine, but the last batch was slow and broken. Almost looks like it just outputs the image from the first few iterations or first preview, instead of the final image..

Also, increasing resolution to 512x640 increases vram usage and again triggers the slowdown after a couple images.

Disabling live previews reduces vram increase rate, allowing more images to complete before the big slowdown.

Dead2 commented 1 year ago

It is quite strange and interesting. If this problem occurs due to high vram occupation, limiting it might help. But unfortunately I can't test and reproduce it because I don't have that expensive video card..

@lshqqytiger I don't think this problem is related to expensive or high vram graphics cards only, but I think it is interesting that it still triggers on these, when you'd expect them to only trigger on cards with less vram. @popsoda27 only has a AMD 6700XT and is also showing very slow image generation and corruption.

That is why I am interested to know whether this triggers at around 70-75% on cards with less vram as well. Because it looks to me like that is what is happening, but it is hard to tell for sure when I only have access to this high-vram card myself.

PennyFranklin commented 1 year ago

my 79xtx also always get 75% vram usage when first image generation is done, and run into slowdown from then on, that is pretty wired

popsoda27 commented 1 year ago

Sorry for the late followup, but reading back all the comments, it suggests to me the incoherent image generation is somehow related a slowdown due to high vram usage?

On my 6700XT, no matter if I did a single image or a batch count of 10, it's always almost maxing out the vram, like a consistent 95%. I'm not sure how this info is helpful. The occurrence of incoherent images is just so random to me that I thought maybe it couldn't be because of SD parameters or vram usage, and more likely due to incompatible Loras to checkpoint models. I can't hope to fix anything with my limited knowledge, so I just simply restarted

lshqqytiger commented 1 year ago

Okay.. In some cases, the high vram occupation seems to be occurring the image corruption. But it hardly happens to me and I can't do much about this issue because this is the lower level one..

sgtsixpack commented 1 year ago

I'm getting something similar now. Only since yesterday 28/05/23. Nothing I can do, its 100% for me atm.

I only see the top of the generated image and the rest is covered in mist. I have an AMD 6800xt. 00032-3324718285 `Launching Web UI with arguments: --opt-sub-quad-attention --medvram --autolaunch J:\AI training\Stable diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\pkg_resources__init__.py:123: PkgResourcesDeprecationWarning: llow is an invalid version and will not be supported in a future release warnings.warn( No module 'xformers'. Proceeding without it. Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled Loading weights [59096eecb2] from J:\AI training\Stable diffusion\stable-diffusion-webui-directml\models\Stable-diffusion\janaDefi_v25.safetensors Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Creating model from config: J:\AI training\Stable diffusion\stable-diffusion-webui-directml\models\Stable-diffusion\janaDefi_v25.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Startup time: 8.3s (import torch: 2.3s, import gradio: 1.7s, import ldm: 0.7s, other imports: 1.7s, load scripts: 0.9s, create ui: 0.4s, gradio launch: 0.4s). Applying optimization: sub-quadratic... done. Textual inversion embeddings loaded(1): opulent beauty Model loaded in 4.2s (load weights from disk: 0.7s, create model: 0.4s, apply weights to model: 2.6s, apply half(): 0.4s, load VAE: 0.1s). 100%|████████████████████████████████████████████████████████████████████████████████| 113/113 [05:59<00:00, 3.18s/it] Total progress: 100%|████████████████████████████████████████████████████████████████| 113/113 [05:52<00:00, 3.12s/it] Total progress: 100%|████████████████████████████████████████████████████████████████| 113/113 [05:52<00:00, 3.10s/it]`

mvancil commented 1 year ago

just an observation here. I began to experience this same degradation after I installed the infinite image browsing extension. It was the only one I installed after a fresh reinstall.

You should probably focus on that extension. For now, I have done a fresh reinstall and have not installed that extension. Everything is fine.