This repository contains the required files for combining gaze information with original MDETR.
MDETR: Modulated Detection for End-to-End Multi-Modal Understanding
First time using the repo:
Make a new conda env and activate it:
conda create -n mdetr_env python=3.8
conda activate mdetr_env
Install the the packages in the requirements.txt:
pip install -r requirements.txt
Repetative usage:
Activate the environment in VS code (Ctrl+shif+p) and select mdetr_env2
(or any name you have selected).
The data collected to test the MDETR and GazeMDETR and the outputs of the tests are accessible through the following links:
As a first step, I make the heatmap, which is the output of the VTD, available in the GazeMDETR demo code, and do the initial tests on how it would be possible to combine it with the features that are output of the backbone in the MDETR. To that end, the heatmap is resized and converted to tensor and then downsampled to the size of the features, such that they can be multiplied. The visualized output is presented below:
In issue 11, you can find this comparison. We decide to implement a more in depth comparison using different prompts.
The initially collected test set is annotated to have the ground truth available and automatize the prompt generation. (issue 12)
The prompts are categorized into distinct groups based on the level of details they include. You can choose the type of prompt when runnign the code using the parser. Please refer to issue 13 for a detailed explanation.
TBC: Refer to issue issue 17
TBD