mit-han-lab / fastcomposer

[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
https://fastcomposer.mit.edu
MIT License
669 stars 38 forks source link

Is fastcomposer good at generalizing concepts in images? #4

Closed TruthSearcher closed 1 year ago

TruthSearcher commented 1 year ago

I checked the examples: shryhrt

It seems to retain the same photorealistic style as that input image and the same facial expression.

tianweiy commented 1 year ago

Thank you for the questions!

  1. Changing the facial expression is indeed hard. I think this is due to our training where we utilize subject crops from the original image as conditioning. Retrieval-based training may improve it and test-time conditioning on multiple reference images will also help. We may also need to use specific tools like StyleGAN2 for certain fine-grained edits. We will add this discussion to the paper later.

  2. I think currently stylization works quite well (also see figure 6). It is essentially a tradeoff between identity preservation and prompt consistency though. If we want to push for style consistency, we can lower the alpha, and get a more stylized image with some loss in identity preservation. Note that this is also style-dependent. For some styles, you can do well while preserving the identity (e.g. the pointillism painting), for the others (like woodblock), painting of this style doesn't get much facial details, so more identity preservation essentially makes it less stylized.

    image