Face id can't preserve well when ipa with openpose controlnet is used?

tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Apache License 2.0

4.46k stars 289 forks source link

Face id can't preserve well when ipa with openpose controlnet is used? #385

Open silence-tang opened 2 weeks ago

silence-tang commented 2 weeks ago

Hi, I need your help! I try to use sd-controlnet-openpose and IPAdapterPlus to generate 512x512 results given a reference id face, a prompt, and an openpose pose map (512x512), but the face id of the result image doesn't align well with the reference id face. Is it because the CLIP encoder can't capture fine id details of the face? If I use IP-Adapter-FaceID with sd-controlnet-openpose, can this problem be addressed? pose

Sunny599 commented 1 week ago

hello，请问问题解决了吗，我也面临相同的问题吗，可以交流下吗

silence-tang commented 1 week ago

hello，请问问题解决了吗，我也面临相同的问题吗，可以交流下吗

还没，方便给个联系方式交流一下吗？我看你是校友hhh

xiaohu2015 commented 1 week ago

hi， when face region is small, it is likely to get facial distortion, you may crop the face region and do another image2image

silence-tang commented 1 week ago

hi， when face region is small, it is likely to get facial distortion, you may crop the face region and do another image2image

hi, thanks for your answer! May I ask how can I "crop the face region" ? It seems that the input id face has already been cropped and the face region covers most of the area of the input image.

Sunny599 commented 1 week ago

hello，请问问题解决了吗，我也面临相同的问题吗，可以交流下吗

还没，方便给个联系方式交流一下吗？我看你是校友hhh

vx:17381567787

silence-tang commented 1 week ago

hi， when face region is small, it is likely to get facial distortion, you may crop the face region and do another image2image

Hi, did you mean that the face id will fail to preserve if the face region only occupies a small area of the result image? Looking forward to hearing from you! Thanks again.

silence-tang commented 1 week ago

hi， when face region is small, it is likely to get facial distortion, you may crop the face region and do another image2image

I also note that if I use a pose-conditioned controlnet with ip-adapter to generate human images, there is a chance that the clothes misaligned with the given prompt, what's the reason? Is it because controlnet + ipa can't handle multiple control signals (id face, prompt and pose) well at the same time and this causes these signals to conflict with each other?