Textual inversion token

@Junoh-Kang Sorry for the late reply. I looked into the examples/textual_inversion in Diffusers and understood that a placeholder token like is used in the prompt, e.g., prompt = "A <cat-toy> backpack". The tokenizer likely splits <cat-toy> into <, cat, -, toy, and >, with an attention map stored for each part. If you want an attention map for the entire <cat-toy> token, you can simply modify the resize_and_save function in utils.py to sum the attention maps for <, cat, -, toy, and > before normalizing them, and save this as the attention map for <cat-toy>. I think this should solve the issue.

wooyeolBaek / attention-map

Textual inversion token #9