lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
7.4k stars 717 forks source link

Clip Skip nonfunctional with SDXL-based checkpoints #387

Open BlankDiploma opened 6 months ago

BlankDiploma commented 6 months ago

Checklist

What happened?

All checkpoints based off Stable Diffusion XL, including the base checkpoint, do not show any variation when changing Clip Skip, even if you set it to 12.

Image generation parameters show that the changing Clip Skip value is being recognized, it shows up in the image info text after generation is complete, but the value doesn't actually affect the output at all.

Steps to reproduce the problem

  1. Load any normal Stable Diffusion checkpoint, generate the same image with Clip Skip set to 1, 2, 12, etc.
  2. Load any Stable Diffusion XL checkpoint, generate the same image with Clip Skip set to 1, 2, 12, etc.

Observe that Stable Diffusion checkpoint properly recognizes the Clip Skip parameter during image generation, but Stable Diffusion XL checkpoints do not.

What should have happened?

Clip Skip should modify the image output normally.

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

sysinfo-2024-02-24-05-27.json

Console logs

Creating venv in directory D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\venv using python "C:\Users\Blank\AppData\Local\Programs\Python\Python310\python.exe"
venv "D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\venv\Scripts\Python.exe"
Python 3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]
Version: f0.0.15v1.8.0rc-latest-233-g2ecb869f
Commit hash: 2ecb869f31f4abab5922c1bd611e375d5bb28e8e
Installing torch and torchvision
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu121
Collecting torch==2.1.2
  Downloading https://download.pytorch.org/whl/cu121/torch-2.1.2%2Bcu121-cp310-cp310-win_amd64.whl (2473.9 MB)
     ---------------------------------------- 2.5/2.5 GB 2.4 MB/s eta 0:00:00
Collecting torchvision==0.16.2
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.16.2%2Bcu121-cp310-cp310-win_amd64.whl (5.6 MB)
     ---------------------------------------- 5.6/5.6 MB 181.5 MB/s eta 0:00:00
Collecting sympy
  Using cached https://download.pytorch.org/whl/sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting jinja2
  Downloading Jinja2-3.1.3-py3-none-any.whl (133 kB)
     ---------------------------------------- 133.2/133.2 KB 7.7 MB/s eta 0:00:00
Collecting filelock
  Using cached filelock-3.13.1-py3-none-any.whl (11 kB)
Collecting fsspec
  Downloading fsspec-2024.2.0-py3-none-any.whl (170 kB)
     ---------------------------------------- 170.9/170.9 KB 10.0 MB/s eta 0:00:00
Collecting networkx
  Downloading https://download.pytorch.org/whl/networkx-3.2.1-py3-none-any.whl (1.6 MB)
     ---------------------------------------- 1.6/1.6 MB 109.2 MB/s eta 0:00:00
Collecting typing-extensions
  Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Collecting pillow!=8.3.*,>=5.3.0
  Downloading https://download.pytorch.org/whl/pillow-10.2.0-cp310-cp310-win_amd64.whl (2.6 MB)
     ---------------------------------------- 2.6/2.6 MB 174.1 MB/s eta 0:00:00
Collecting requests
  Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting numpy
  Downloading numpy-1.26.4-cp310-cp310-win_amd64.whl (15.8 MB)
     ---------------------------------------- 15.8/15.8 MB 162.5 MB/s eta 0:00:00
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-2.1.5-cp310-cp310-win_amd64.whl (17 kB)
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.2.1-py3-none-any.whl (121 kB)
     ---------------------------------------- 121.1/121.1 KB 6.9 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading certifi-2024.2.2-py3-none-any.whl (163 kB)
     ---------------------------------------- 163.8/163.8 KB ? eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.6-py3-none-any.whl (61 kB)
     ---------------------------------------- 61.6/61.6 KB 3.2 MB/s eta 0:00:00
Collecting charset-normalizer<4,>=2
  Using cached charset_normalizer-3.3.2-cp310-cp310-win_amd64.whl (100 kB)
Collecting mpmath>=0.19
  Using cached https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: mpmath, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, fsspec, filelock, charset-normalizer, certifi, requests, jinja2, torch, torchvision
Successfully installed MarkupSafe-2.1.5 certifi-2024.2.2 charset-normalizer-3.3.2 filelock-3.13.1 fsspec-2024.2.0 idna-3.6 jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.4 pillow-10.2.0 requests-2.31.0 sympy-1.12 torch-2.1.2+cu121 torchvision-0.16.2+cu121 typing-extensions-4.9.0 urllib3-2.2.1
WARNING: You are using pip version 22.0.4; however, version 24.0 is available.
You should consider upgrading via the 'D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\venv\Scripts\python.exe -m pip install --upgrade pip' command.
Installing clip
Installing open_clip
Cloning assets into D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\stable-diffusion-webui-assets...
Cloning into 'D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\stable-diffusion-webui-assets'...
remote: Enumerating objects: 20, done.
remote: Counting objects: 100% (20/20), done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 20 (delta 0), reused 20 (delta 0), pack-reused 0
Receiving objects: 100% (20/20), 132.70 KiB | 2.07 MiB/s, done.
Cloning Stable Diffusion into D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\stable-diffusion-stability-ai...
Cloning into 'D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\stable-diffusion-stability-ai'...
remote: Enumerating objects: 580, done.
remote: Counting objects: 100% (357/357), done.
remote: Compressing objects: 100% (128/128), done.
remote: Total 580 (delta 260), reused 229 (delta 229), pack-reused 223
Receiving objects:  95% (551/580), 54.71 MiB | 36.35 MiB/s
Receiving objects: 100% (580/580), 73.44 MiB | 37.43 MiB/s, done.
Resolving deltas: 100% (279/279), done.
Cloning Stable Diffusion XL into D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\generative-models...
Cloning into 'D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\generative-models'...
remote: Enumerating objects: 871, done.
remote: Counting objects: 100% (500/500), done.
remote: Compressing objects: 100% (235/235), done.
remote: Total 871 (delta 375), reused 270 (delta 264), pack-reused 371
Receiving objects: 100% (871/871), 42.67 MiB | 27.14 MiB/s, done.
Resolving deltas: 100% (452/452), done.
Cloning K-diffusion into D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\k-diffusion...
Cloning into 'D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\k-diffusion'...
remote: Enumerating objects: 1340, done.
remote: Counting objects: 100% (622/622), done.
remote: Compressing objects: 100% (86/86), done.

Receiving objects: 100% (1340/1340), 242.04 KiB | 1.47 MiB/s, done.
Resolving deltas: 100% (939/939), done.
Cloning BLIP into D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\BLIP...
Cloning into 'D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\repositories\BLIP'...
remote: Enumerating objects: 277, done.
remote: Counting objects: 100% (165/165), done.
remote: Compressing objects: 100% (30/30), done.
Receiving objects: 100% (277/277)used 136 (delta 135), pack-reused 112
Receiving objects: 100% (277/277), 7.03 MiB | 23.31 MiB/s, done.
Resolving deltas: 100% (152/152), done.
Installing requirements
Installing forge_legacy_preprocessor requirement: fvcore
Installing forge_legacy_preprocessor requirement: mediapipe
Installing forge_legacy_preprocessor requirement: onnxruntime
Installing forge_legacy_preprocessor requirement: svglib
Installing forge_legacy_preprocessor requirement: insightface
Installing forge_legacy_preprocessor requirement: handrefinerportable
Installing forge_legacy_preprocessor requirement: depth_anything
Launching Web UI with arguments:
Total VRAM 24563 MB, total RAM 130983 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : native
Hint: your device supports --pin-shared-memory for potential speed improvements.
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype: torch.bfloat16
Using pytorch cross attention
Downloading: "https://huggingface.co/lllyasviel/fav_models/resolve/main/fav/realisticVisionV51_v51VAE.safetensors" to D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\models\Stable-diffusion\realisticVisionV51_v51VAE.safetensors

100%|██████████████████████████████████████████████████████████████████████████████| 1.99G/1.99G [00:08<00:00, 254MB/s]
ControlNet preprocessor location: D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\models\ControlNetPreprocessor
Calculating sha256 for D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\models\Stable-diffusion\realisticVisionV51_v51VAE.safetensors: 2024-02-23 21:36:20,692 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 420.4s (prepare environment: 393.6s, import torch: 6.3s, import gradio: 1.8s, setup paths: 2.5s, initialize shared: 0.4s, other imports: 2.4s, list SD models: 8.6s, load scripts: 3.3s, create ui: 0.7s, gradio launch: 0.4s).
15012c538f503ce2ebfc2c8547b268c75ccdaff7a281db55399940ff1d70e21d
Loading weights [15012c538f] from D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\models\Stable-diffusion\realisticVisionV51_v51VAE.safetensors
model_type EPS
UNet ADM Dimension 0
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
To load target model SD1ClipModel
Begin to load 1 model
Reuse 0 loaded models
[Memory Management] Current Free Memory (MB) =  22981.9990234375
[Memory Management] Model Memory (MB) =  454.2076225280762
[Memory Management] Estimated Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining Memory (MB) =  21503.791400909424
Moving model(s) has taken 0.50 seconds
Model loaded in 5.6s (calculate hash: 3.4s, forge load real models: 1.3s, calculate empty prompt: 0.8s).
Calculating sha256 for D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\models\Stable-diffusion\sd_xl_base_1.0.safetensors: 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b
Loading weights [31e35c80fc] from D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\models\Stable-diffusion\sd_xl_base_1.0.safetensors
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
To load target model SDXLClipModel
Begin to load 1 model
Reuse 0 loaded models
[Memory Management] Current Free Memory (MB) =  22610.3603515625
[Memory Management] Model Memory (MB) =  2144.3546981811523
[Memory Management] Estimated Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining Memory (MB) =  19442.005653381348
Moving model(s) has taken 0.66 seconds
Model loaded in 10.2s (unload existing model: 0.3s, calculate hash: 5.3s, forge load real models: 3.7s, calculate empty prompt: 0.8s).
Downloading VAEApprox model to: D:\Stable Diffusion\sdi\stable-diffusion-webui-forge\models\VAE-approx\vaeapprox-sdxl.pt
100%|███████████████████████████████████████████████████████████████████████████████| 209k/209k [00:00<00:00, 15.3MB/s]
To load target model SDXL
Begin to load 1 model
Reuse 0 loaded models
[Memory Management] Current Free Memory (MB) =  21133.46533203125
[Memory Management] Model Memory (MB) =  4897.086494445801
[Memory Management] Estimated Inference Memory (MB) =  1310.72
[Memory Management] Estimated Remaining Memory (MB) =  14925.65883758545
Moving model(s) has taken 1.16 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  5.46it/s]
To load target model AutoencoderKL█████████████████████████████████████████████████████| 20/20 [00:03<00:00,  5.99it/s]
Begin to load 1 model
Reuse 0 loaded models
[Memory Management] Current Free Memory (MB) =  16114.41162109375
[Memory Management] Model Memory (MB) =  159.55708122253418
[Memory Management] Estimated Inference Memory (MB) =  4356.0
[Memory Management] Estimated Remaining Memory (MB) =  11598.854539871216
Moving model(s) has taken 0.42 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.61it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  5.66it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.66it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  5.51it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.61it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  5.65it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.67it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  5.98it/s]

Additional information

No response

catboxanon commented 6 months ago

At least upstream, it was determined CLIP skip should not affect the output of any SDXL generations since all models were trained using the penultimate layer (which was not the case for SD1). https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/12518#issuecomment-1676364751

BlankDiploma commented 6 months ago

At least upstream, it was determined CLIP skip should not affect the output of any SDXL generations since all models were trained using the penultimate layer (which was not the case for SD1). AUTOMATIC1111/stable-diffusion-webui#12518 (comment)

Oh. Well, that certainly explains it. Maybe it should be hidden from the UI when an SDXL model is loaded? It's at least a bit misleading the way it's currently displayed.

lllyasviel commented 6 months ago

i can fix it in 5 minutes but forge will try to get same result with webui

however if most users vote for a functional sdxl clip skip but upstream refuse to have it, Forge may still implement it after multiple user reports because it can be seen as a part of backend thing