[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
MIT License
801
stars
107
forks
source link
Question about the visualization of CLIP‘s text token #18
An excellent work. And I notice you just provided a explame to visualize the image token in CLIP's image encoder. Is it able to visualize the text token in CLIP? If it is ok, how can I do this?
An excellent work. And I notice you just provided a explame to visualize the image token in CLIP's image encoder. Is it able to visualize the text token in CLIP? If it is ok, how can I do this?