[Bug]: SDXL & SDXL Turbo screws up the end of last frame on DirectML

Darkestaxe1 commented 9 months ago

Checklist

[X] The issue exists after disabling all extensions
[ ] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

SDXL Turbo and DreamshaperXL-Turbo produce a normal image but it gets extremely contrasted and saturated prior to returning the final frame. The effect is similar to the 1.5 VAE version bug but not identical. The image looks fine in the preview but it's wrecked at the very end, whether it finishes or you interrupt. I can reproduce the effect in GIMP with extreme contrast and saturation so I believe it probably is VAE related in some way.

Steps to reproduce the problem

Install A1111 DirectML by following the guide "Install and Run on AMD GPUs"
Download sd_xl_turbo_1.0_fp16.safetensors from huggingface and place it in your webui\stable-diffusion-webui-directml\models\Stable-diffusion folder
Use the example prompt and settings at https://stable-diffusion-art.com/sdxl-turbo/ to txt2img the doggo in a snowglobe. Prompt: beautiful landscape scenery glass bottle with a galaxy inside cute fennec fox snow HDR sunset Steps: 1, Sampler: Euler a, CFG scale: 1, Size: 512x512

What should have happened?

A normal amount of saturation and contrast that doesn't look like spilled ink

This: 00025-3755695490

Not This: 00021-3755695490

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

sysinfo-2023-12-27-08-05.json

Console logs

venv "C:\Stable Diffusion\webui\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: 1.7.0
Commit hash: 668ee14d1f6a4959c064e0b54175beccfe0c7057
Launching Web UI with arguments: --lowvram --use-directml
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
*** "Disable all extensions" option was set, will not load any extensions ***
Loading weights [e869ac7d69] from C:\Stable Diffusion\webui\stable-diffusion-webui-directml\models\Stable-diffusion\sd_xl_turbo_1.0_fp16.safetensors
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
Startup time: 9.4s (prepare environment: 0.3s, import torch: 3.5s, import gradio: 1.0s, setup paths: 0.9s, initialize shared: 0.9s, other imports: 0.5s, scripts list_optimizers: 1.0s, create ui: 0.4s, gradio launch: 0.7s).
Creating model from config: C:\Stable Diffusion\webui\stable-diffusion-webui-directml\repositories\generative-models\configs\inference\sd_xl_base.yaml
Loading VAE weights specified in settings: C:\Stable Diffusion\webui\stable-diffusion-webui-directml\models\VAE\sdxlVAE_v10.safetensors
Applying attention optimization: InvokeAI... done.
Model loaded in 9.7s (load weights from disk: 1.2s, create model: 0.8s, apply weights to model: 6.1s, load VAE: 0.5s, calculate empty prompt: 1.0s).
Restoring base VAE
Applying attention optimization: InvokeAI... done.
VAE weights loaded.
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.72s/it]
Total progress: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.50s/it]
Loading VAE weights specified in settings: C:\Stable Diffusion\webui\stable-diffusion-webui-directml\models\VAE\sdxlVAE_v10.safetensors
Applying attention optimization: InvokeAI... done.
VAE weights loaded.
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.56s/it]
Total progress: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.25s/it]
Restoring base VAE0%|████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.25s/it]
Applying attention optimization: InvokeAI... done.
VAE weights loaded.
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.28s/it]
Total progress: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.41s/it]
T

Additional information

I'm using an RX 570 8g on Windows 10. Adrenaline 23.11.1 released 10/17/23 says it's up to date.

When trying to fix this issue I git pulled to 1.7 and got the "torch is not able to use gpu" bug. While I was using --skip-torch-cuda-test SDXL Turbo was working normally, though obviously not so 'turbo'. Once I switched to --use-directml in my command line everything else went back to normal but SDXL Turbo went back to screwing up the pictures at the last second.

I was getting about the same effect from JuggernautXL-V7 before I switched to V7-fp16VAEfix, so more reasons I assume it has something somehow to do with VAE. Using sdxlVAE_v10 or none or Automatic is the same.

Ryzen 5 5600g, 32g RAM, Windows 10 Pro 22H2,

webui is located on the same 1tb nvme SSD as windows.

I tried to install Olive and LCM about a week ago and couldn't get it working. Then it mysteriously showed up as a tab in my DirectML installation which sometimes kinda works. Since I couldn't make heads or tails of AMDs so called "Guide", who knows if I installed duplicate copies of python and other dependencies into various system folders. At the very least I forced some installer to mess with the environment variables in spite of a warning that doing so could break other python stuff or something like that.

Darkestaxe1 commented 9 months ago

Update: The issue is with SDXL and SDXL Turbo but doesn't affect some third party XL checkpoints.

Perhaps the working third party XL checkpoints may be partially based on SD1.5 models? Regardless the new title should probably be "[Bug]: SDXL/SDXL Turbo screws up the end of last frame on DirectML"

I found some non-turbo XL checkpoints that were doing the same thing for me. So I downloaded and tested sd_xl_base_1.0 and sd_xl_base_1.0_0.9vae. Both produced the same bugged results. I also tried using sd_xl_base_1.0_0.9vae with sd_xl_refiner_1.0_0.9vae as the refiner for the last two steps, same bug.

Tested working XL Checkpoints include: Colossus Project XL - colossusProjectXLSFW_v53Trained.safetensors [f7a1beed86] Juggernaut XL - juggernautXL_v7FP16VAEFix.safetensors [ae825c75cf]

I've tested several others and I can't figure out why those two work and everything else doesn't.

Non working third party checkpoints include all three editions of Dreamshaper XL, Juggernaut V7+RunDiffusion, Jib Mix Realistic XL v6, and VenusXL v1.1

andres-ulloa commented 9 months ago

Same issue here. Radeon 680m 6 gb VRAM.

jasjisdo commented 9 months ago

Hi @Darkestaxe1,

can you download this the sdxl vae and try again:

https://huggingface.co/stabilityai/sdxl-vae/blob/main/sdxl_vae.safetensors

and put it in the following location:

C:\Stable Diffusion\webui\stable-diffusion-webui-directml\models\VAE

Can you then just for testing define this settings:

after testing set it back to Automatic

❓❓❓ And a question: Where you got the file `sdxlVAE_v10.safetensors` from? ❓❓❓

jasjisdo commented 9 months ago

This is my result on cuda with turbo and sdxl_vae.safetensors

Result: (Based on this recommended text2img settings)

VAE in Settings:

Darkestaxe1 commented 9 months ago

I tested sd_xl_turbo_1.0_fp16.safetensors [e869ac7d69] with sdxl_vae.safetensors from your link. I also retested sd_xl_base_1.0.safetensors [31e35c80fc] with it just to be sure. I don't know where sdxlVAE_v10.safetensors came from, I thought I googled SDXL vae and got it from huggingface, but I just tried that and I only found the same file you just linked me to.

Regardless the results were the same. Image is normal in the preview but gets contrasted and saturated at completion. Remember it's also happening with SD VAE set to none. In fact I didn't have any vae files when the issue started, just an empty folder, I only started downloading VAEs because I read about the VAE bug and wondered if it could be related. Also I saw the bug first with Dreamshaper Turbo, and the Dreamshaper Civit.ai page says no need for vae/refiner.

SDXL Turbo Screenshot 2024-01-02 132628

SDXL 1.0 Base Screenshot 2024-01-02 135332

SDXL 1.0 Base with Refiner, just for completeness Screenshot 2024-01-02 135739

I re-ran for the log when using sdxl_vae.safetensors, in case it's useful somehow `venv "C:\Stable Diffusion\webui\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: 1.7.0 Commit hash: 668ee14d1f6a4959c064e0b54175beccfe0c7057 Launching Web UI with arguments: --lowvram --use-directml no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. ControlNet preprocessor location: C:\Stable Diffusion\webui\stable-diffusion-webui-directml\extensions\sd-webui-controlnet\annotator\downloads 2024-01-02 14:00:35,972 - ControlNet - INFO - ControlNet v1.1.425 2024-01-02 14:00:36,089 - ControlNet - INFO - ControlNet v1.1.425 Loading weights [31e35c80fc] from C:\Stable Diffusion\webui\stable-diffusion-webui-directml\models\Stable-diffusion\sd_xl_base_1.0.safetensors Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 11.5s (prepare environment: 1.1s, import torch: 3.5s, import gradio: 1.1s, setup paths: 1.0s, initialize shared: 1.0s, other imports: 0.6s, load scripts: 1.6s, create ui: 0.7s, gradio launch: 0.7s). Creating model from config: C:\Stable Diffusion\webui\stable-diffusion-webui-directml\repositories\generative-models\configs\inference\sd_xl_base.yaml Loading VAE weights specified in settings: C:\Stable Diffusion\webui\stable-diffusion-webui-directml\models\VAE\sdxl_vae.safetensors Applying attention optimization: sub-quadratic... done. Model loaded in 8.7s (load weights from disk: 1.5s, create model: 0.5s, apply weights to model: 5.5s, load VAE: 0.2s, calculate empty prompt: 0.9s). Reusing loaded model sd_xl_base_1.0.safetensors [31e35c80fc] to load sd_xl_turbo_1.0_fp16.safetensors [e869ac7d69] Loading weights [e869ac7d69] from C:\Stable Diffusion\webui\stable-diffusion-webui-directml\models\Stable-diffusion\sd_xl_turbo_1.0_fp16.safetensors Loading VAE weights specified in settings: C:\Stable Diffusion\webui\stable-diffusion-webui-directml\models\VAE\sdxl_vae.safetensors Applying attention optimization: sub-quadratic... done. Weights loaded in 13.2s (send model to cpu: 0.7s, load weights from disk: 1.2s, apply weights to model: 11.1s, load VAE: 0.1s). 100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.91s/it] Total progress: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.76s/it] Total progress: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.76s/it] `

jasjisdo commented 9 months ago

Thank you for trying this out again. 👍🏽 Then this really seems to be a bug. 😕

I had the hypothesis that it was a VAE miscofiguration.

The problem definitely has something to do with the VAE.

I sometimes have this problem with cuda too, but when I use the sd_xl_base_1.0.safetensors I don't have the problem with cuda at all as you can see in my screenshots. Unfortunately I don't have a DirectML compatible graphics card otherwise I could have analysed further.

Here some short explanation what a VAE is:

STABLE DIFFUSION VARIATIONAL AUTENCODER (VAE) EXPLAINED A variational autoencoder (VAE) is a technique used to improve the quality of AI generated images you create with the text-to-image model Stable Diffusion. VAE encodes the image into a latent space and then that latent space is decoded into a new, higher quality image.

It is also responsible for converting the tensor representation of your generated image back into RGB pixels.

There seems to be an error in this conversion and therefore the RGB colours are not correct.

Colours are probably represented differently in DirectML than in cuda.

Sorry, I can't help you any further for now.

Darkestaxe1 commented 9 months ago

Well thankyou for your time and the attempt.

Gonzalo1987 commented 9 months ago

Same problem here, perhaps a option for skip the vae processing? The image it's fine until the last frame.

Darkestaxe1 commented 8 months ago

I tested ComfyUI, it works when using the same venv folder and same cmd line args as Automatic1111. Automatic1111 still doesn't. Using Comfy UI fixed both SDXL and SDXL Turbo using the default workflow and the example settings I used in OP.

A1111

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS= --lowvram --use-directml
call webui.bat

Comfy

"C:\Stable Diffusion\webui\stable-diffusion-webui-directml\venv\Scripts\activate.bat"

python main.py --directml --lowvram

lshqqytiger / stable-diffusion-webui-amdgpu