long8v / PTIR

Paper Today I Read
19 stars 0 forks source link

[154] Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment #169

Open long8v opened 6 months ago

long8v commented 6 months ago
image

paper, page, dataset

TL;DR

Details

image image image

Image source

image

Proposed ConGen

SeeTrue-Feedback benchmark

SeeTrue dataset에 기반해서 위의 ConGen과 비슷한 방식으로 뽑은 뒤에 AMT에 태워서 2008개의 샘플을 인간이 검수함.

image

Evaluation metrics

image

Result

최신 VLM모델들에게 아래와 같이 질의

image image image image

limitation of model prediction

image