Closed Wangxuefeng92 closed 1 month ago
You can refer to FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention 's code about how to extract the attention map. For virtual try-on task, multiple attention maps need to be averaged into one.
hi, sir. could you share something about 'attention map display' like Figure4 in your paper ? thank you for your great work on BooW-VTON!