Open Junoh-Kang opened 3 months ago
@Junoh-Kang
Sorry for the late reply. I looked into the examples/textual_inversion
in Diffusers and understood that a placeholder token like prompt = "A <cat-toy> backpack"
. The tokenizer likely splits <cat-toy>
into <
, cat
, -
, toy
, and >
, with an attention map stored for each part. If you want an attention map for the entire <cat-toy>
token, you can simply modify the resize_and_save
function in utils.py
to sum the attention maps for <
, cat
, -
, toy
, and >
before normalizing them, and save this as the attention map for <cat-toy>
. I think this should solve the issue.
Thank you for your code. How can I visualize prompt with textual inversion tokens?