Question about the visualization of CLIP‘s text token

hila-chefer / Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

MIT License

801 stars 107 forks source link

Question about the visualization of CLIP‘s text token #18

Closed Kihensarn closed 2 years ago

Kihensarn commented 2 years ago

An excellent work. And I notice you just provided a explame to visualize the image token in CLIP's image encoder. Is it able to visualize the text token in CLIP? If it is ok, how can I do this?

hila-chefer commented 2 years ago

Hi @Kihensarn, thanks for your interest!

Yes! I've added textual visualization to our notebook, feel free to check it out and experiment with it!

Best, Hila.

Kihensarn commented 2 years ago

Thanks，I will try it later！