lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
5.28k stars 519 forks source link

[Bug]: IP-Adapter ControlNet not working #821

Open genialgenteel opened 1 week ago

genialgenteel commented 1 week ago

Checklist

What happened?

Hello!

I'm not sure if this'll ever be addressed now given active development on Forge for average end-users is being suspended, but I figured I'll ask anyway....

When I try to use the IP-Adapter of ControlNet models, I get errors and the ControlNet is not applied. but no matter which of the IP-Adapter preprocessors and/or models (and LoRA if needed) I use, I get this error and it doesn't work. Other ControlNets like Depth, OpenPose, and Canny still work.

I haven't tried reproducing on a clean install, but I have disabled all my third-party extensions. Weirdly, it worked exactly twice the first time I disabled all third-party extensions. Then when I turned extensions back on, I started getting the error again. I turned everything back off, but the miracle ended and I kept getting the same error anyway with them on or off. Restarted the computer and reopened SD with all 3rd party extensions already off; didn't work. I'm at a loss.

If anyone has any idea what's wrong, I'd appreciate some assistance. Thanks.

Steps to reproduce the problem

  1. Enter control image in ControlNet
  2. Select IP-Adapter
  3. Pick a matching preprocessor/model e.g. (InsightFace+CLIP-H (IPAdapter) & ip-adapter-faceid-plusv2_sd15 [6e14fc1a] + ip-adapter-faceid-plusv2_sd15_lora) or (CLIP-ViT-H (IPAdapter) & ip-adapter-plus-face_sd15 [71693645])
  4. Generate image

What should have happened?

The ControlNet should have been applied.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

sysinfo - Copy.txt

Console logs

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f0.0.17v1.8.0rc-latest-276-g29be1da7
Commit hash: 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
Launching Web UI with arguments: --enable-insecure-extension-access --listen --port XXXX --xformers --upcast-sampling --disable-safe-unpickle --theme dark --no-hashing
Total VRAM 8192 MB, total RAM 16069 MB
WARNING:xformers:A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xformers version: 0.0.23.post1
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3070 Ti Laptop GPU : native
Hint: your device supports --pin-shared-memory for potential speed improvements.
Hint: your device supports --cuda-malloc for potential speed improvements.
Hint: your device supports --cuda-stream for potential speed improvements.
VAE dtype: torch.bfloat16
CUDA Stream Activated:  False
Using xformers cross attention
ControlNet preprocessor location: C:\Users\USERNAME\Pictures\sd.webui\webui\models\ControlNetPreprocessor
Loading weights [None] from C:\Users\USERNAME\Pictures\sd.webui\webui\models\Stable-diffusion\SD15\MODELNAME.safetensors
2024-06-22 13:06:14,410 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL:  XXXXXXXXXXXXXX
model_type EPS
UNet ADM Dimension 0
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
loaded straight to GPU
To load target model BaseModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5422.03662109375
[Memory Management] Model Memory (MB) =  0.00762939453125
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  4398.028991699219
Moving model(s) has taken 0.01 seconds
To load target model SD1ClipModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5421.98388671875
[Memory Management] Model Memory (MB) =  454.2076225280762
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  3943.776264190674
Moving model(s) has taken 0.07 seconds

To create a public link, set `share=True` in `launch()`.
Startup time: 16.8s (prepare environment: 4.5s, import torch: 3.8s, import gradio: 0.9s, setup paths: 0.5s, initialize shared: 0.1s, other imports: 0.6s, load scripts: 1.4s, create ui: 0.6s, gradio launch: 4.2s).
Model loaded in 5.0s (load weights from disk: 0.3s, forge instantiate config: 0.2s, forge load real models: 3.8s, calculate empty prompt: 0.7s).
2024-06-22 13:07:26,226 - ControlNet - INFO - ControlNet Input Mode: InputMode.SIMPLE
2024-06-22 13:07:26,226 - ControlNet - INFO - Using preprocessor: InsightFace+CLIP-H (IPAdapter)
2024-06-22 13:07:26,226 - ControlNet - INFO - preprocessor resolution = 512
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
Warning torch.load doesn't support weights_only on this pytorch version, loading unsafely.
2024-06-22 13:07:34,574 - ControlNet - INFO - Current ControlNet IPAdapterPatcher: C:\Users\USERNAME\Pictures\sd.webui\webui\models\ControlNet\ip-adapter-faceid-plusv2_sd15.bin
NeverOOM Enabled for UNet (always maximize offload)
NeverOOM Enabled for VAE (always tiled)
VARM State Changed To NO_VRAM
[LORA] Loaded C:\Users\USERNAME\Pictures\sd.webui\webui\models\Lora\SD15\LORANAME.safetensors for BaseModel-UNet with 192 keys at weight 0.9 (skipped 0 keys)
[LORA] Loaded C:\Users\USERNAME\Pictures\sd.webui\webui\models\Lora\SD15\LORANAME.safetensors for BaseModel-CLIP with 72 keys at weight 0.9 (skipped 0 keys)
To load target model SD1ClipModel
Begin to load 1 model
[Memory Management] Requested SYNC Preserved Memory (MB) =  0.0
[Memory Management] SYNC Loader Disabled for  EmbeddingsWithFixes(
  (wrapped): Embedding(49408, 768)
)
[Memory Management] SYNC Loader Disabled for  Embedding(49408, 768)
[Memory Management] SYNC Loader Disabled for  Embedding(77, 768)
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  162.2314453125
[Memory Management] Parameters Loaded to GPU (MB) =  434.4755859375
Moving model(s) has taken 0.17 seconds
token_merging_ratio = 0.1
C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
INFO: InsightFace detection resolution lowered to (384, 384).
To load target model CLIPVisionModelWithProjection
Begin to load 1 model
[Memory Management] Requested SYNC Preserved Memory (MB) =  0.0
[Memory Management] SYNC Loader Disabled for  Embedding(257, 1280)
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  1204.9609375
[Memory Management] Parameters Loaded to GPU (MB) =  0.62744140625
Moving model(s) has taken 0.04 seconds
*** Error running process_before_every_sampling: C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_controlnet\scripts\controlnet.py
    Traceback (most recent call last):
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\modules\scripts.py", line 835, in process_before_every_sampling
        script.process_before_every_sampling(p, *script_args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_controlnet\scripts\controlnet.py", line 555, in process_before_every_sampling
        self.process_unit_before_every_sampling(p, unit, self.current_params[i], *args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_controlnet\scripts\controlnet.py", line 501, in process_unit_before_every_sampling
        params.model.process_before_every_sampling(p, cond, mask, *args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_ipadapter\scripts\forge_ipadapter.py", line 147, in process_before_every_sampling
        unet = opIPAdapterApply(
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_ipadapter\lib_ipadapter\IPAdapterPlus.py", line 690, in apply_ipadapter
        clip_embed = clip_vision.encode_image(image).penultimate_hidden_states
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\ldm_patched\modules\clip_vision.py", line 70, in encode_image
        outputs = self.model(pixel_values=pixel_values, output_hidden_states=True)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\transformers\models\clip\modeling_clip.py", line 1310, in forward
        vision_outputs = self.vision_model(
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\transformers\models\clip\modeling_clip.py", line 865, in forward
        hidden_states = self.embeddings(pixel_values)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\transformers\models\clip\modeling_clip.py", line 199, in forward
        embeddings = torch.cat([class_embeds, patch_embeds], dim=1)
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)

---
To load target model BaseModel
Begin to load 1 model
[Memory Management] Requested SYNC Preserved Memory (MB) =  0.0
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  1639.406135559082
[Memory Management] Parameters Loaded to GPU (MB) =  0.0
Moving model(s) has taken 0.41 seconds
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:06<00:00,  3.00it/s]
To load target model AutoencoderKL█████████████████████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00,  3.62it/s]
Begin to load 1 model
[Memory Management] Requested SYNC Preserved Memory (MB) =  0.0
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  159.55708122253418
[Memory Management] Parameters Loaded to GPU (MB) =  0.0
Moving model(s) has taken 0.02 seconds
VAE tiled decode: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:01<00:00,  8.54it/s]
Total progress: 100%|██████

Additional information

Input image: seph_ac_003

The two times IP-Adapter worked when I tested it (512x512 with no upscaling, so they look pretty bad lol): 00010-seizamix_v2_869400007 00009-seizamix_v2_2548859270

Then it stopped working and the outputs reflect that (clearly the IP-Adapter was not applied): 00014-seizamix_v2_2936734378 00015-seizamix_v2_538814642