tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.9k stars 316 forks source link

Training details #120

Open okaris opened 10 months ago

okaris commented 10 months ago

Firstly, I'd like to extend my heartfelt appreciation for the exceptional models and pipelines you've developed. They are truly remarkable and have greatly aided my work.

I am interested in adapting the recently released SDXL face model for a different training objective. To better tailor the model to my needs, I'd love to understand the dataset and loss function you've utilized in more detail.

While the examples provided in the notebook are astonishing in their quality, I've encountered challenges in reproducing similar results with different image inputs. I wonder if this may be due to the model being trained on synthetic face images.

Your insights on the training process would be invaluable to me. If you could take the time to elaborate on this, I would be extremely grateful.

Thank you once again for your exceptional work and looking forward to your response.

xiaohu2015 commented 10 months ago

@okaris for current version, we trained face model using the same architecture as IP-Adapter-plus: we use the cropped face as image condition and original image as the target.

For the training data, you can refer to this: https://github.com/tencent-ailab/IP-Adapter/issues/74 (some real data and some AI data) For the preprocessing of face image, you can refer to this: https://github.com/tencent-ailab/IP-Adapter/issues/54

To be honest, current methods have certain limitations. First of all, what the model learns is the face structure, not the face similarity (if you care this, you can follow this https://www.reddit.com/r/StableDiffusion/comments/17ds81k/ai_yearbook_photos_workflow_with_stable_diffusion/). Secondly, the generated pictures will be affected by the background, and the hairstyle is difficult to change. At present, our development version (unreleased version in SD15) has certain improvements in face consistency, but it still has the above limitations.

eezywu commented 10 months ago

@xiaohu2015 Have you tried removing the background & hair of the face image? Could you share more about what you tried to improve the face consistency?

Looking forward to your response. Thanks!

xiaohu2015 commented 9 months ago

@xiaohu2015 Have you tried removing the background & hair of the face image? Could you share more about what you tried to improve the face consistency?

Looking forward to your response. Thanks!

yes, removing the background & hair of the face image is helpful

using more features can improve the face consistency, and dinov2 is a good choise

bent1e commented 7 months ago

how to removing the hair?When my face has long bangs, the image generated by the faceid also has the same bangs, which is not what I want