Open xduzhangjiayu opened 1 week ago
@xduzhangjiayu It doesn't support for DiT-based models right now, and I am planning to modify it in the next update. However, due to time constraints, I don't think I'll be able to do it for a while.
If you need to use it urgently, I recommend changing the AttnProcessor
to AttnProcessor2_0
, and then redefining the call method of the module that uses AttnProcessor
in DiT to override it.
@wooyeolBaek Thanks for the reply, I will try it, also look forward to your update!
Hi, Do you have any advice for this? For U-Net based model, we can use Q (image) * K (Text) to get attention score, but for DiT-based model, image and Text both have Q,K,V,so I'm very confused about this. Any suggestions would be appreciated, thanks!
@xduzhangjiayu As far as I know, since the image and text hidden states are concatenated for an attention operation, the resulting matrix can be viewed as performing self-attention and cross-attention simultaneously. The upper-left and bottom-right parts of the matrix represent self-attention, while the upper-right and bottom-left parts represent cross-attention. To obtain the attention map as used in Stable Diffusion 1, you can extract the upper-right attention map, where the image is used as the query and the text as the key.
@wooyeolBaek Many thanks for the advice, I have tried your method and it seems works!
@xduzhangjiayu I'm glad it worked well! I've also added features to make it compatible with SD3 and support batch operations, and I've also refactored it for more intuitive application, so feel free to refer to it if needed.
@wooyeolBaek Thanks for your notice and this awesome project !
Hi, Thanks for the excellent work! Will this project support Attention map visualization for DiT-based (SD3, FLUX...) model?