Open guijuzhejiang opened 6 months ago
faceid model should be able to change expressions, faceid plus model maybe not, but you can try faceid plus v2 to achieve that (use lower weight)
faceid model should be able to change expressions, faceid plus model maybe not, but you can try faceid plus v2 to achieve that (use lower weight)
I visualized the face token maps for the faceid and faceid_plus model. From the attention maps, it appears that faceid_plus controls more accurately. Can I merge the visualization code into the main branch? It would simplify visualizing face token maps for all models you release.
thanks a lot
How does one visualise the face embeddings?Best, Raf On 3 Jan 2024, at 13:42, YZBPXX @.***> wrote:
faceid model should be able to change expressions, faceid plus model maybe not, but you can try faceid plus v2 to achieve that (use lower weight)
I visualized the face token maps for the faceid and faceid_plus model. From the attention maps, it appears that faceid_plus controls more accurately. Can I merge the visualization code into the main branch? It would simplify visualizing face token maps for all models you release. 2024-01-03.20.41.27.png (view on web) 2024-01-03.20.41.52.png (view on web)
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
How does one visualise the face embeddings?Best, Raf On 3 Jan 2024, at 13:42, YZBPXX @.> wrote: faceid model should be able to change expressions, faceid plus model maybe not, but you can try faceid plus v2 to achieve that (use lower weight) I visualized the face token maps for the faceid and faceid_plus model. From the attention maps, it appears that faceid_plus controls more accurately. Can I merge the visualization code into the main branch? It would simplify visualizing face token maps for all models you release. 2024-01-03.20.41.27.png (view on web) 2024-01-03.20.41.52.png (view on web) —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.>
You can browse the recently updated code in the 'visual_attnmap.ipynb' notebook.
This is great! Would the get_net_attn_map function work for the ip_adapter_plus? (no faceid)
This is great! Would the get_net_attn_map function work for the ip_adapter_plus? (no faceid)
I just submitted a pull request. You just need to use the following three lines of code as you would use faceid
pipe.unet = register_cross_attention_hook(pipe.unet)
attn_maps = get_net_attn_map((768, 512))
attn_hot = attnmaps2images(attn_maps)
That's great @YZBPXX Thanks. A final question, in get_nett_attn_map what batch_size means here? Is it related with the pos. neg. conditions or with the number of images?
That's great @YZBPXX Thanks. A final question, in get_nett_attn_map what batch_size means here? Is it related with the pos. neg. conditions or with the number of images?
Yes, currently, I have only considered the case of num_samples=1, batch_size=2 (generate one image requires sampling twice). If generating multiple images at once, it may throw an error, and some modifications need to be made. I will make modifications later to support num_samples > 1. If you have any more questions, I'd be happy to answer them for you.
Thanks a lot @YZBPXX! I was looking at my shapes of the attn_maps and found:
self.attn_map.shape torch.Size([2, 8, 4096, 16]) self.attn_map.shape torch.Size([2, 8, 4096, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 64, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 4096, 16]) self.attn_map.shape torch.Size([2, 8, 4096, 16]) self.attn_map.shape torch.Size([2, 8, 4096, 16])
Just to be sure, the last dimension (16) is the features were the image is projected, so for the attention maps, I could average them?
Thanks a lot @YZBPXX! I was looking at my shapes of the attn_maps and found:
self.attn_map.shape torch.Size([2, 8, 4096, 16]) self.attn_map.shape torch.Size([2, 8, 4096, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 64, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 256, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 1024, 16]) self.attn_map.shape torch.Size([2, 8, 4096, 16]) self.attn_map.shape torch.Size([2, 8, 4096, 16]) self.attn_map.shape torch.Size([2, 8, 4096, 16])
Just to be sure, the last dimension (16) is the features were the image is projected, so for the attention maps, I could average them?
the last dimension(16) is the number of 16 face tokens. If you average them, it should yield the combined result of the 16 tokens. I have attempted to do this, but the displayed image is just a noise image without any meaningful information.
Great job, I used the faceid model and the characters are very well reproduced. But I also found a problem, probably because of face embedding, the expression of the reference person is highly restored, even if I write different expression ptompts in prompts, I can't change the expression as well, is there any way to change the expression of the reference person?