tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.48k stars 293 forks source link

IP-Adapter-FaceID-Portrait with SDXL model? #314

Open dawei03896 opened 3 months ago

xiaohu2015 commented 3 months ago

https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid-portrait_sdxl.bin

xddun commented 3 months ago

Can you provide an example of using this ip-adapter-faceid-portrait_sdxl.bin? The example I wrote is causing an error.

xiaohu2015 commented 3 months ago

it is same as sd15 faceid-portrait, you can use https://github.com/cubiq/ComfyUI_IPAdapter_plus

xddun commented 3 months ago

Indeed, as you said, I have successfully conducted the experiment. Thank you.

I also want to know how to train the ip-adapter-faceid-portrait_sdxl.bin. Could you provide some suggestions? Thanks.

xiaohu2015 commented 3 months ago

you can refer to https://github.com/tencent-ailab/IP-Adapter/wiki/IP%E2%80%90Adapter%E2%80%90Face

for portrait, we use portrait data to train, we use face bounding box to crop portrait data (with 2x region). For SDXL, we train it at 1024x1024

xddun commented 3 months ago

I have currently thoroughly experienced this project, and I feel that the model structure is truly delightful. You must have gone through many experimental paths to achieve such good results. IP-Adapter-FaceID, especially this version: ip-adapter-faceid-portrait_sdxl.bin, exhibits great flexibility in facial similarity. Compared to InstantID, it's much more flexible! This is a very good solution.

I currently have several questions and would appreciate your help in clearing my doubts:

  1. Is IP-Adapter-FaceID supposed to be adapted to the SD model during training? Since the underlying noise reduction layers of SD's Unet rely on CLIP, the IP-Adapter-FaceID you trained can be used with other SD models, but isn't it more suitable for other SD models if the IP-Adapter-FaceID model is fine-tuned alongside them (for finer results)? I believe this is the case, am I correct?

  2. In comparative analysis, I distinctly feel that the characters produced by the ip-adapter-faceid-portrait_sdxl.bin model are better and more similar to the input individuals. However, I also sadly realize that the training code probably requires more professional expertise to use. I hope this aspect can be improved upon to better leverage the advantages of IP-Adapter-FaceID. I hope to receive a more detailed training process for ip-adapter-faceid-portrait_sdxl.bin.

  3. How much data is required to train ip-adapter-faceid-portrait_sdxl.bin?

  4. Does the input for ip-adapter-faceid-portrait_sdxl.bin consist of embedding features of multiple facial images, with the output being the selection of a particular input image? I believe that such an approach might lead to training failure. Therefore, I hope more information can be disclosed regarding the process here.

xddun commented 3 months ago
  1. Have you considered a way to control the proportions of the faces, such as incorporating landmarks like in InstantID?
xiaohu2015 commented 2 months ago

@xddun hi, the training of faceid-portrait is almost the same as that of faceid, except for the following two differences: (1) faceid-portrait don't use lora module; (2) faceid-portrait uses portrait images (we firstly use face detector to get face region, the we expand the image to 2x of the region) to train.

xiaohu2015 commented 2 months ago

3. How much data is required to train ip-adapter-faceid-portrait_sdxl.bin?

1) all the IP-Adapter models are trained with base SD models (e.g. original SD1.5 and SDXL), once trained, it can be used for other finetuned SD models. 2) you can refer to this https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train_faceid.py, and remove LoRA, and use face detector to crop portrait images. 3) about 60w images; 4) no, in the training stage, it just use one face embedding; 5) I think it can, but I have not try that.

xddun commented 2 months ago

Thank you for your response, they are all very good experiences!

katarzynasornat commented 1 month ago

you can refer to https://github.com/tencent-ailab/IP-Adapter/wiki/IP%E2%80%90Adapter%E2%80%90Face

for portrait, we use portrait data to train, we use face bounding box to crop portrait data (with 2x region). For SDXL, we train it at 1024x1024

@xiaohu2015 Is there a support for SDXL when it comes to ip-adapter-faceid? When I use it, am I able to generate full scene with just face fixed? Cause I tried to use p-adapter-faceid and I got just kind of portrait of image. (e.g my prompt was "a girl ridding a horse, full pose" - then the horse and a girl was cutted and when I generated without faceid adapter I got normal full pose).

thanks for help