zeyofu / BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]
https://zeyofu.github.io/blink/
Apache License 2.0
107 stars 7 forks source link

Why is the annotation method of the cases given in your paper different from the actual one provided in huggingface? #9

Closed Anonymous3790 closed 3 months ago

Anonymous3790 commented 3 months ago

Why is the annotation method of the cases given in your paper different from the actual one provided in huggingface?

in your paper image

in huggingface:

image

zeyofu commented 3 months ago

Hi, Thanks for asking. As in Figure 5 caption and Figure 1 caption, the markers (visual prompts) in the figures in our paper are intentionally enlarged and different from the actual ones for illustration purposes. The correct annotations are as in the huggingface dataset.