shivahanifi / GazeMDETR

Apache License 2.0
0 stars 0 forks source link

GazeMDETR

This repository contains the required files for combining gaze information with original MDETR.

Original MDETR repository

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

WebsiteColabPaper

GazeMDETR

Data

The data collected to test the MDETR and GazeMDETR and the outputs of the tests are accessible through the following links:

Combining gaze information with the MDETR data

  1. As a first step, I make the heatmap, which is the output of the VTD, available in the GazeMDETR demo code, and do the initial tests on how it would be possible to combine it with the features that are output of the backbone in the MDETR. To that end, the heatmap is resized and converted to tensor and then downsampled to the size of the features, such that they can be multiplied. The visualized output is presented below:

Initial comparison beteen MDETR and GazeMDETR

In issue 11, you can find this comparison. We decide to implement a more in depth comparison using different prompts.

Annotating the collected dataset

The initially collected test set is annotated to have the ground truth available and automatize the prompt generation. (issue 12)

Prompts

The prompts are categorized into distinct groups based on the level of details they include. You can choose the type of prompt when runnign the code using the parser. Please refer to issue 13 for a detailed explanation.

Evaluation metrics

TBC: Refer to issue issue 17

Collecting a data with cluttered scenes

TBD