Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers <https://arxiv.org/abs/2103.15679>
_|youtube|
.. |youtube| image:: https://img.shields.io/static/v1?label=ICCV2021&message=12MinuteVideo&color=red :target: https://www.youtube.com/watch?v=bQTL34Dln-M
|DETR_LXMERT|
.. |DETR_LXMERT| image:: https://colab.research.google.com/assets/colab-badge.svg :target: https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/Transformer_MM_Explainability.ipynb
|CLIP|
.. |CLIP| image:: https://colab.research.google.com/assets/colab-badge.svg :target: https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/CLIP_explainability.ipynb
Demo: You can check out a demo on Huggingface spaces <https://huggingface.co/spaces/PaulHilders/CLIPGroundingExplainability>
_ or scan the following QR code.
.. image:: https://user-images.githubusercontent.com/19412343/176676771-d26f2146-9901-49e7-99be-b030f3d790de.png :width: 100
|ViT|
.. |ViT| image:: https://colab.research.google.com/assets/colab-badge.svg :target: https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/Transformer_MM_explainability_ViT.ipynb
.. sectnum::
Notice that we have two jupyter
notebooks to run the examples presented in the paper.
The notebook for LXMERT <./LXMERT.ipynb>
_ contains both the examples from the paper and examples with images from the internet and free form questions.
To use your own input, simply change the URL
variable to your image and the question
variable to your free form question.
.. image:: LXMERT.PNG
.. image:: LXMERT-web.PNG
The notebook for DETR <./DETR.ipynb>
_ contains the examples from the paper.
To use your own input, simply change the URL
variable to your image.
.. image:: DETR.PNG
^^^^^^^^^^ VisualBERT ^^^^^^^^^^
Run the run.py
script as follows:
.. code-block:: bash
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=pwd
python VisualBERT/run.py --method=
.. note::
If the datasets aren't already in env.data_dir
, then the script will download the data automatically to the path in env.data_dir
.
^^^^^^ LXMERT ^^^^^^
valid.json <https://nlp.cs.unc.edu/data/lxmert_data/vqa/valid.json>
_:.. code-block:: bash
pushd data/vqa
wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/valid.json
popd
COCO_val2014
set to your local machine... note::
If you already downloaded `COCO_val2014` for the `VisualBERT`_ tests, you can simply use the same path you used for `VisualBERT`_.
perturbation.py
script as follows:.. code-block:: bash
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=`pwd` python lxmert/lxmert/perturbation.py --COCO_path /path/to/COCO_val2014 --method <method_name> --is-text-pert <true/false> --is-positive-pert <true/false>
^^^^ DETR ^^^^
DETR repository <https://github.com/facebookresearch/detr#data-preparation>
_.Notice you only need the validation set.
Locate the cocoeval.py
script in your python library path:
find library path:
.. code-block:: python
import sys
print(sys.path)
find cocoeval.py
:
.. code-block:: bash
cd /path/to/lib
find -name cocoeval.py
Change the self.iouThrs
value in the setDetParams
function (which sets the parameters for the COCO detection evaluation) in the Params
class as follows:
insead of:
.. code-block:: python
self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True) use:
.. code-block:: python
self.iouThrs = np.linspace(.2, 0.95, int(np.round((0.95 - .2) / .05)) + 1, endpoint=True)
.. code-block:: bash
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=`pwd` python DETR/main.py --coco_path /path/to/coco/dataset --eval --masks --resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth --batch_size 1 --method <method_name>
If you make use of our work, please cite our paper:
.. code-block:: latex
@InProceedings{Chefer_2021_ICCV,
author = {Chefer, Hila and Gur, Shir and Wolf, Lior},
title = {Generic Attention-Model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {397-406}
}
MMF <https://github.com/facebookresearch/mmf>
_ framework.offical LXMERT <https://github.com/airsplay/lxmert>
implementation and on Hugging Face Transformers <https://github.com/huggingface/transformers>
.offical DETR <https://github.com/facebookresearch/detr>
_ implementation.offical CLIP <https://github.com/openai/CLIP>
_ implementation.final project <https://github.com/bpiyush/CLIP-grounding>
_.