IP-Adapter-FaceID and FaceID plus not support

zhaoyun0071 commented 10 months ago

Describe the bug

diffusers\loaders\unet.py", line 780, in _load_ip_adapter_weights
    num_image_text_embeds = state_dict["image_proj"]["latents"].shape[1]
KeyError: 'latents'

Reproduction


import cv2
from insightface.app import FaceAnalysis
import numpy as np
from PIL import Image
import torch
from diffusers import StableDiffusionPipeline, DDIMScheduler
from diffusers.utils import load_image

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows * cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid

noise_scheduler = DDIMScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    clip_sample=False,
    set_alpha_to_one=False,
    steps_offset=1
)

pipeline = StableDiffusionPipeline.from_single_file(
    r"Realistic_Vision_V4.0.safetensors",
    torch_dtype=torch.float16,
    scheduler=noise_scheduler,
    feature_extractor=None,
    load_safety_checker=False
).to("cuda")

generator = torch.Generator(device="cpu").manual_seed(42)
num_images = 4
image = load_image("zly.jpg")

app = FaceAnalysis(name="buffalo_l", providers=['CPUExecutionProvider'], root='./')
app.prepare(ctx_id=0, det_size=(640, 640))

image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
faces = app.get(image)
image = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)

pipeline.load_ip_adapter(r'./data/models/ip_adapter/', subfolder='models', weight_name="ip-adapter-faceid_sd15.bin")
# pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder='',weight_name="ip-adapter-faceid_sd15.bin")
pipeline.set_ip_adapter_scale(0.7)

Logs

No response

System Info

Diffusers 0.25.0

Who can help?

@yiyixuxu @DN6 @sayakpaul @patrickvonplaten

fabiorigano commented 10 months ago

Hi @zhaoyun0071, diffusers 0.25.0 does not support the two models you mentioned because they are experimental versions. PR #6276 is adding support for IPAdapter FaceID.

zhaoyun0071 commented 10 months ago

thanks

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

katarzynasornat commented 5 months ago

Hi @zhaoyun0071, diffusers 0.25.0 does not support the two models you mentioned because they are experimental versions. PR #6276 is adding support for IPAdapter FaceID.

Hi @fabiorigano , does diffusers 0.25.0 and higher support it now? I still do have the same error when loading the models.

fabiorigano commented 5 months ago

hi @katarzynasornat, all Face ID models are supported with #7186, and thus starting with diffusers 0.28 (dev, the current version at the time of this writing)

katarzynasornat commented 5 months ago

hi @katarzynasornat, all Face ID models are supported with #7186, and thus starting with diffusers 0.28 (dev, the current version at the time of this writing)

EDIT: I think there is a typo in the example (red marked). It should be ip_adapter_image_embeds instead of ip_adapter_image :)

@fabiorigano thanks for your reply! I have tried to follow #7186 with the code provided, however I am getting an error:

AttributeError: 'NoneType' object has no attribute 'parameters'

Is that a normal behaviour? Did I miss something?

I attached my gist with execution below.

fabiorigano commented 5 months ago

@katarzynasornat when using FaceID, pass ip_adapter_image_embeds instead of ip_adapter_image. Follow the docs https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter#face-model (snippet after Full Face example)

fabiorigano commented 5 months ago

EDIT: I think there is a typo in the example (red marked). It should be ip_adapter_image_embeds instead of ip_adapter_image :)

yes, but don't follow PR's examples, as we changed few things during integration docs are updated :)

qwerdf4 commented 5 months ago

only support sd1.5，sdxl not support

juancopi81 commented 5 months ago

Hi @fabiorigano ,

Thanks for the great work! I was wondering if the docs are updated. I copied and pasted this (I am using the ip-adapter-faceid-plusv2_sd15.bin ckpt)

from insightface.utils import face_align

num_images = 1 # Not in the docs

ref_images_embeds = []
ip_adapter_images = []
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
faces = app.get(image)
ip_adapter_images.append(face_align.norm_crop(image, landmark=faces[0].kps, image_size=224))
image = torch.from_numpy(faces[0].normed_embedding)
ref_images_embeds.append(image.unsqueeze(0))
ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0)
neg_ref_images_embeds = torch.zeros_like(ref_images_embeds)
id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")

clip_embeds = pipeline.prepare_ip_adapter_image_embeds(
  [ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]

pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = True # True if Plus v2

And I am getting:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-0a5647ac6e53> in <cell line: 18>()
     16 id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")
     17 
---> 18 clip_embeds = pipeline.prepare_ip_adapter_image_embeds(
     19   [ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]
     20 

1 frames
/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py in prepare_ip_adapter_image_embeds(self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance)
    522             ):
    523                 output_hidden_state = not isinstance(image_proj_layer, ImageProjection)
--> 524                 single_image_embeds, single_negative_image_embeds = self.encode_image(
    525                     single_ip_adapter_image, device, 1, output_hidden_state
    526                 )

/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py in encode_image(self, image, device, num_images_per_prompt, output_hidden_states)
    482 
    483     def encode_image(self, image, device, num_images_per_prompt, output_hidden_states=None):
--> 484         dtype = next(self.image_encoder.parameters()).dtype
    485 
    486         if not isinstance(image, torch.Tensor):

AttributeError: 'NoneType' object has no attribute 'parameters'

I also tried:

# ADDING ip_embeds
clip_embeds = pipeline.prepare_ip_adapter_image_embeds(
  [ip_adapter_images], id_embeds, torch.device("cuda"), num_images, True)[0]

And got:

ValueError                                Traceback (most recent call last)
[<ipython-input-11-837bd85b00a7>](https://localhost:8080/#) in <cell line: 19>()
     17 
     18 # ADDING ip_embeds
---> 19 clip_embeds = pipeline.prepare_ip_adapter_image_embeds(
     20   [ip_adapter_images], id_embeds, torch.device("cuda"), num_images, True)[0]
     21 

[/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py](https://localhost:8080/#) in prepare_ip_adapter_image_embeds(self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance)
    540             for single_image_embeds in ip_adapter_image_embeds:
    541                 if do_classifier_free_guidance:
--> 542                     single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
    543                     single_image_embeds = single_image_embeds.repeat(
    544                         num_images_per_prompt, *(repeat_dims * len(single_image_embeds.shape[1:]))

ValueError: not enough values to unpack (expected 2, got 1)

Sorry if I am missing something obvious. Here is my colab:

fabiorigano commented 5 months ago

hi @juancopi81 docs are updated, you are not properly loading the weights, as all Face ID Plus models need a CLIP image encoder. See https://huggingface.co/docs/diffusers/using-diffusers/loading_adapters#ip-adapter-face-id-models or copy-paste the following snippet (you can find it in docs too):

from transformers import CLIPVisionModelWithProjection

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
    torch_dtype=torch.float16,
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    image_encoder=image_encoder,
    torch_dtype=torch.float16
).to("cuda")

pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name="ip-adapter-faceid-plusv2_sd15.bin")

juancopi81 commented 5 months ago

Thanks @fabiorigano!! Maybe adding that to this docs would be helpful? https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter#face-model At least for me, it would have been helpful. If you think so, I could create a PR to add that.Thanks again!

fabiorigano commented 5 months ago

I don't know, I think the user should read the loading tutorial first and then the specific use cases, or at least read both.

hx173149 commented 5 months ago

only support sd1.5，sdxl not support

have you solved this problem, about the sdxl support?

fabiorigano commented 5 months ago

only support sd1.5，sdxl not support

have you solved this problem, about the sdxl support?

sorry, what's wrong with SDXL models? do you have a snippet to reproduce any pipeline? I can run Face ID with SDXL

hx173149 commented 5 months ago

only support sd1.5，sdxl not support

have you solved this problem, about the sdxl support?

sorry, what's wrong with SDXL models? do you have a snippet to reproduce any pipeline? I can run Face ID with SDXL

yes, I have this error message:
source code is here:

ipface_pipe = StableDiffusionXLControlNetPipeline.from_single_file(
          base_model_path,
          controlnet=ipface_controlnet,
          image_encoder=ipface_image_encoder,
          torch_dtype=torch.float16,
          variant="fp16",
          ).to('cuda:2')
ipface_pipe.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name="ip-adapter-faceid_sdxl.bin",image_encoder_folder=None)

fabiorigano commented 4 months ago

are you working with diffusers>=0.28.0?

hx173149 commented 4 months ago

are you working with diffusers>=0.28.0?

I am using 0.27.0 version

fabiorigano commented 4 months ago

Face ID models are not supported in 0.27.0

hx173149 commented 4 months ago

Face ID models are not supported in 0.27.0

thanks I will try it~

katarzynasornat commented 4 months ago

only support sd1.5，sdxl not support

have you solved this problem, about the sdxl support?

sorry, what's wrong with SDXL models? do you have a snippet to reproduce any pipeline? I can run Face ID with SDXL

@fabiorigano I think I got something, please look here

Tried to follow examples from doc with diffusers 0.28.2 but got this error

/usr/local/lib/python3.10/dist-packages/diffusers/models/embeddings.py in forward(self, id_embeds)
   1191             torch.Tensor: Output Tensor.
   1192         """
-> 1193         id_embeds = id_embeds.to(self.clip_embeds.dtype)
   1194         id_embeds = self.proj(id_embeds)
   1195         id_embeds = id_embeds.reshape(-1, self.num_tokens, self.embed_dim)

AttributeError: 'NoneType' object has no attribute 'dtype'

fabiorigano commented 4 months ago

hi @katarzynasornat you have to set the clip_embeds attribute for Face ID Plus models:

pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False # True if Plus v2

Check the example right before this paragraph https://huggingface.co/docs/diffusers/v0.28.2/en/using-diffusers/ip_adapter#multi-ip-adapter

katarzynasornat commented 4 months ago

hi @katarzynasornat you have to set the clip_embeds attribute for Face ID Plus models:
pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False # True if Plus v2
Check the example right before this paragraph https://huggingface.co/docs/diffusers/v0.28.2/en/using-diffusers/ip_adapter#multi-ip-adapter

Hi @fabiorigano thank for your great input, I missed that part. I was able to put it all together to generate some image, but I still do have some doubts, below my output

Why faces are doubled - does it require to manipulate with Insightface?
When generating the same prompt without IP Adapter FACEID plus I am getting full pose (like whole horse and the background etc). With IP Adapter it is always cut. Is that the way how should be? Meaning, were all IP Adapters for Face/portrait trained to get only the corpus of the person? And if I want to have it full I need to mask the face somehow?

Here random output form SDXL for the same prompt without IP Adapter FACEID

fabiorigano commented 4 months ago

@katarzynasornat

have you tried different generator seeds?
yes, you may need to mask some parts of the image to have a better control over the output

hx173149 commented 4 months ago

@fabiorigano hello I want to ask, did the IP-Adapter-FaceID model support the "ip_adapter_masks" parameter? just like this:

ipface_image = ipface_pipe(  
    prompt_embeds=conditioning.to('cuda:2'),negative_prompt_embeds=neg_conditioning.to('cuda:2'),  
    pooled_prompt_embeds=pooled.to('cuda:2'),negative_pooled_prompt_embeds=neg_pooled.to('cuda:2'),  
    image=big_control_img,controlnet_conditioning_scale=0.5,  
    control_guidance_start=0.0,control_guidance_end=0.5,  
    width=896, height=1152, num_images_per_prompt=1,  
    ip_adapter_image_embeds=[id_embeds],  
    num_inference_steps=infer_steps, generator=generator,  
    cross_attention_kwargs={"ip_adapter_masks": masks}).images[0]

fabiorigano commented 4 months ago

@hx173149 yes, this is a pipeline feature, so you can apply it with any loaded IP Adapter

hx173149 commented 4 months ago

@hx173149 yes, this is a pipeline feature, so you can apply it with any loaded IP Adapter

but I have got a error, when I used multi face images with "ip-adapter-faceid-portrait_sd15.bin" model,
error message like this:

Traceback (most recent call last):
  File "/home/houxin/code/batch_test/batch.py", line 497, in <module>
    ipface_image = ipface_pipe(
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl.py", line 1511, in __call__
    noise_pred = self.unet(
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1220, in forward
    sample, res_samples = downsample_block(
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1288, in forward
    hidden_states = attn(
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/diffusers/models/transformers/transformer_2d.py", line 448, in forward
    hidden_states = block(
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/diffusers/models/attention.py", line 366, in forward
    attn_output = self.attn2(
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 539, in forward
    return self.processor(
  File "/home/houxin/.conda/envs/python39/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 2563, in __call__
    raise ValueError(
ValueError: Number of masks (1) does not match number of ip images (4) at index 0

fabiorigano commented 4 months ago

I cannot reproduce without the example, here you have documentation in any case https://huggingface.co/docs/diffusers/v0.28.2/en/using-diffusers/ip_adapter#ip-adapter-masking

huggingface / diffusers