junjiehe96 / UniPortrait

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Apache License 2.0
188 stars 6 forks source link

About the ID Embedding #6

Closed 963658029 closed 2 weeks ago

963658029 commented 1 month ago

Hi, thanks for the great work. I have the following questions about ID Embedding:

  1. Is CLIP using last_ midden_state? or last 2 hidden state?
  2. How does the ouput feature of CLIP concatenate with the multi-scale features of Face? The former dimension is (b, 257, 1280), while the latter dimension is almost different from this.
  3. How many layers are there in the MLP behind CLIP and Face Encoder?
junjiehe96 commented 1 month ago
  1. The second-to-last hidden state;
  2. Only the local features of CLIP (b, 256, 1280) are used, without including the global feature; multi-scale face features are interpolated to the same spatial size (16x16);
  3. The details of the mlp can be found in the proj_id and proj_clip in UniPortrait Q-Former
963658029 commented 1 month ago

for the question 2, is it like "F.interpolate(x, size=(16, 16), mode='bilinear', align_corners=False)" ? What mode to choose?

junjiehe96 commented 1 month ago

Yes, bilinear interpolation

963658029 commented 1 month ago

ok, thanks

963658029 commented 1 month ago

did you do the following for all "mid features from face encoder"? e.g., embedding / np.linalg.norm(embedding)

junjiehe96 commented 1 month ago

did you do the following for all "mid features from face encoder"? e.g., embedding / np.linalg.norm(embedding)

No, we directly applied the intermediate layer features without any other processing. Normalizing before further processing them might be a good idea.