wooyeolBaek / attention-map

🚀 Cross attention map tools for huggingface/diffusers
https://huggingface.co/spaces/We-Want-GPU/diffusers-cross-attention-map-SDXL-t2i
MIT License
150 stars 9 forks source link

Textual inversion token #9

Open Junoh-Kang opened 3 months ago

Junoh-Kang commented 3 months ago

Thank you for your code. How can I visualize prompt with textual inversion tokens?

wooyeolBaek commented 2 months ago

@Junoh-Kang Sorry for the late reply. I looked into the examples/textual_inversion in Diffusers and understood that a placeholder token like is used in the prompt, e.g., prompt = "A <cat-toy> backpack". The tokenizer likely splits <cat-toy> into <, cat, -, toy, and >, with an attention map stored for each part. If you want an attention map for the entire <cat-toy> token, you can simply modify the resize_and_save function in utils.py to sum the attention maps for <, cat, -, toy, and > before normalizing them, and save this as the attention map for <cat-toy>. I think this should solve the issue.