Open Honey-666 opened 3 months ago
hi,
please refer to documentation, here you have the link to the face models. can you try the following code?
clip_embeds = pipeline.prepare_ip_adapter_image_embeds(
[ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]
if you use CFG (classifier-free guidance), you must provide both neg_ref_images_embeds
and ref_images_embeds
. in the original implementation this is the default behaviour
hi,
- please refer to documentation, here you have the link to the face models. can you try the following code?
clip_embeds = pipeline.prepare_ip_adapter_image_embeds( [ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]
- if you use CFG (classifier-free guidance), you must provide both
neg_ref_images_embeds
andref_images_embeds
. in the original implementation this is the default behaviour
1、 ok!
I successfully passed the test demo, but the test case seems to have an extra parenthesis in this line of code
this code: id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda"))
And when I modified this test code to the plus version, he reported the following error:
File "C:\work\pythonProject\demo01\venv\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [512]
This is my revised code:
import cv2
import numpy as np
import torch
from PIL import Image
from diffusers import StableDiffusionPipeline, DDIMScheduler
from insightface.app import FaceAnalysis
from transformers import CLIPVisionModelWithProjection
model_path = '../../../aidazuo/models/Stable-diffusion/stable-diffusion-v1-5'
clip_path = '../../../aidazuo/models/CLIP-ViT-H-14-laion2B-s32B-b79K'
ip_adapter_path = '../../../aidazuo/models/IP-Adapter-FaceID'
ip_img_path = '../../../aidazuo/jupyter-script/test-img/ip_mask_girl1.png'
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
clip_path,
torch_dtype=torch.float16,
use_safetensors=True
)
pipeline = StableDiffusionPipeline.from_pretrained(
model_path,
torch_dtype=torch.float16,
image_encoder=image_encoder
).to("cuda")
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_ip_adapter(ip_adapter_path, subfolder=None, weight_name="ip-adapter-faceid-plus_sd15.bin",
image_encoder_folder=None)
pipeline.set_ip_adapter_scale(0.6)
image = Image.open(ip_img_path)
ref_images_embeds = []
app = FaceAnalysis(name="buffalo_l", root=ip_adapter_path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
faces = app.get(image)
image = torch.from_numpy(faces[0].normed_embedding)
ref_images_embeds.append(image.unsqueeze(0))
ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0)
neg_ref_images_embeds = torch.zeros_like(ref_images_embeds)
id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")
generator = torch.Generator(device="cpu").manual_seed(42)
clip_embeds = pipeline.prepare_ip_adapter_image_embeds([image], None, torch.device("cuda"), 1, True)[0]
pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False # True if Plus v2
images = pipeline(
prompt="A photo of a girl",
ip_adapter_image_embeds=[id_embeds],
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
num_inference_steps=20, num_images_per_prompt=1,
generator=generator
).images
2、Does CFG refer to the "guidance_scale" parameter? It always seems to have a value, and if its value is 0, don't we need to add these two lines of code?
thank you for spotting the error, it seems there is another one, I will fix documentation in a future PR
I forgot to upload the correct preprocessing for Face ID plus model:
from insightface.utils import face_align
ref_images_embeds = []
ip_adapter_images = []
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
faces = app.get(image)
ip_adapter_images.append(face_align.norm_crop(image, landmark=faces[0].kps, image_size=224))
image = torch.from_numpy(faces[0].normed_embedding)
ref_images_embeds.append(image.unsqueeze(0))
ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0)
neg_ref_images_embeds = torch.zeros_like(ref_images_embeds)
id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")
generator = torch.Generator(device="cpu").manual_seed(42)
clip_embeds = pipeline.prepare_ip_adapter_image_embeds([ip_adapter_images], None, torch.device("cuda"), 1, True)[0]
pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False
thank you for spotting the error, it seems there is another one, I will fix documentation in a future PR
I forgot to upload the correct preprocessing for Face ID plus model:
from insightface.utils import face_align ref_images_embeds = [] ip_adapter_images = [] app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) app.prepare(ctx_id=0, det_size=(640, 640)) image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB) faces = app.get(image) ip_adapter_images.append(face_align.norm_crop(image, landmark=faces[0].kps, image_size=224)) image = torch.from_numpy(faces[0].normed_embedding) ref_images_embeds.append(image.unsqueeze(0)) ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0) neg_ref_images_embeds = torch.zeros_like(ref_images_embeds) id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda") generator = torch.Generator(device="cpu").manual_seed(42) clip_embeds = pipeline.prepare_ip_adapter_image_embeds([ip_adapter_images], None, torch.device("cuda"), 1, True)[0] pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16) pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False
- for the Face ID models we have to prepare the inputs before passing them to the pipeline, so you have to create it as written in the example code
With the new preprocessing method described above I have been able to pass the PLus test. Thank you very much for your answer!
@fabiorigano does this code work with loading multiple different ip adapters without restriction?
For instance if I want to load a face plus v1 and v2 adapter is that possible? I would assume not because how can I set
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False
per adapter.
Additionally it is unclear to me how to have a collection face id and none face adapters. Is that supported?
Hi @jfischoff You should be able to load both Face ID Plus models. You should pass a list with their names to the load_ip_adapter method:
pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name=["ip-adapter-faceid-plus_sd15.bin", "ip-adapter-faceid-plusv2_sd15.bin"])
Then, just for the second element of the projection layer list: pipeline.unet.encoder_hid_proj.image_projection_layers[1].shortcut = True
Thanks for the response @fabiorigano.
So should I set
pipeline.unet.encoder_hid_proj.image_projection_layers[i].clip_embeds = faceid_clip_embeds[i]
pipeline.unet.encoder_hid_proj.image_projection_layers[i].shortcut = is_v2[i]
for each face ip adapter?
Is it a problem if I have loaded a mix of non-faceid ip adapters and face id adapters? Does that affect the index I need to use in image_projection_layers
or is image_projection_layers
only used by the faceid ip adapters? Should I set the clip_embeds for non-faceid plus models as well?
What about how I pass images/embed to the pipeline when I have a mix of face id and non-faceid adapters? If I'm using a faceid model, should I include the embeddings in the same are when calling the pipeline?
yes, that's correct
Each ip adapter passed in the list to the load_ip_adapter
method has its corresponding image_projection_layers
module, so be sure to index the correct one :)
the clip_embeds attribute is only needed for Face ID Plus models, because these adapters (v1 and v2) were trained with both CLIP image embeddings and insightface embeddings.
You can combine different IP adapters; I have tested some combinations. As anticipated above, it is not necessary to set CLIP embeddings to the other image projection modules, and you would get an error because the clip_embeds attribute doesn't exist in the other image projection classes.
https://github.com/huggingface/diffusers/blob/9ef43f38d43217f690e222a4ce0239c6a24af981/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L492
error msg:
hi! I'm having some problems using the ip adapter FaceID PLus. Can you help me answer these questions? Thank you very much
ip_adapter_image
parameter in theprepare_ip_adapter_image_embeds
function@yiyixuxu @fabiorigano
os:
diffusers==diffusers-0.28.0.dev0
this is my code: