hila-chefer / Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
MIT License
801 stars 107 forks source link

Details about the changes in the code of base models #17

Open NikhilM98 opened 2 years ago

NikhilM98 commented 2 years ago

I am trying to study the code in this repository. However, it is difficult to figure out the changes that have been made in sub folders of the Base Models (VisualBERT, LXMERT, DETR, etc) for this project.

Since the original repository of the Base Models may have changes after the code has been copied to this repository (i.e. their histories may not align), it becomes difficult to compare the Git diff.

It would be helpful if it is possible to attach the git commit tag/id of the Base Models repositories corresponding to the latest commit when they were cloned. Using the commit tag, it will be convenient to align the original code with the code in this repository and compare the changes in the model.

Additionally, it may be helpful for future research, but probably time consuming if those changes can be documented.

hila-chefer commented 2 years ago

Hi @NikhilM98, thank you for this suggestion! I'll add it to my todo list, and do my best to document the added code. The method itself is quite straightforward, and I'd recommend using this file which contains the implementation of all the rules described in the paper in one place. Basically, all you need to do in order to add our method to your model is add hooks to the attention layers to save the attention maps and their gradients see a simple example here, and then apply the rules on the attention maps as done here. To figure out which attention rules you need to use, you can use section 3.2. Adaptation to attention types of our paper. A relatively simple example is the implementation for CLIP.

I hope this helps. Best, Hila.