kongzhecn / OMG

OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models
https://kongzhecn.github.io/omg-project/
557 stars 38 forks source link

Combining OMG with InstantID for multi-concept generation #3

Open yunbinmo opened 3 months ago

yunbinmo commented 3 months ago

Hi, thanks for the amazing job and I would like to ask what is the exact approach of using OMG with InstantID for multi-concept generation? I understand that you have the inference code available but I don't quite understand what it is doing.

As I know, the architecture of InstantID only takes in one input reference image, it would be good if I can get a high-level view of how you combine OMG with InstantID for multi-concept generation when there are more than one reference images, thank you!

yzhang2016 commented 3 months ago

InstantID can take multiple images as reference. The embeddings of all reference images are averaged as the input of the identity net (control net).

yzhang2016 commented 3 months ago

The key idea is the two-stage generation and the noise blending.

Why two-stage?

How to combine with InstantID or ID LoRAs

yunbinmo commented 3 months ago

I see! Thanks for the reply!

But I have one more question, if embeddings of multiple images are averaged as an input to the identityNet, would we expect some kind of mixture of facial features from different IDs?

In other words, the performance of using an average of 3 ID image embeddings would look worse than using an average of 2 ID image embeddings, is that generally the case?

tanghengjian commented 1 month ago

i guess: 1、3 ID or 2 ID, may not the OMG's scope 2、averaged 3 ID face imbedings extracted by insight face may has been implied by instantid?

tanghengjian commented 1 month ago

hi, @yzhang2016 we tested that, the first stage's person face area will limit the second stage's user face generation, in other word, we found less face similarity in some case. can we make an adaptive similar face with user's face in stage 1, utilize adapter faceID ?