lupantech / MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
https://mathvista.github.io/
Creative Commons Attribution Share Alike 4.0 International
197 stars 28 forks source link

Problem 699 (among others): Ambiguous problem about age gap #6

Closed mbchang closed 6 months ago

mbchang commented 7 months ago

Question 699 asks: What is the age gap between the center and the rightmost person? (unit: years).

Context:

Issues with this question:

  1. It is not clear what "left" or "right" means. One might assume that the "rightmost person" refers to Paolo Guerrero in the white shirt, which would lead one to answer "3" (or "4", depending on how you calculate the age gap). However the ground truth is 0, which would suggest "rightmost" actually means "rightmost with respect to the direction that the players are facing", which is actually "left" from the perspective of the viewer of the image.
  2. It is not clear from the question how an age gap is supposed to be calculated. The ground truth answer is 0, but the answer could also be 1 depending on how an age gap is calculated. From April 22 2023 to June 5 2023, Cassio would 36 while David would be 35, and using the difference in the peoples' nominal ages could be another method for calculating the age gap. CoT GPT-4 (Caption+OCR) answered "1", but this is marked as incorrect given the string-matching score calculation employed by the paper. It appears unreasonable to mark CoT GPT-4 (Caption+OCR) incorrect here, unless the instructions for how to calculate the age gap were given (e.g. "Round the number of years between the two people to the nearest integer").

Questions 614, 367, 311, 398, 405, 518, 70, 208, 317, 946, 741, 745, 381, 473, 158, 41, 792, 845, 864, 988, 830, 795, 299, 240, 859, 838, 42, 788, 417, 313, 433, 126, 428, 366, 680, 60, 36, 590, 53, 960, 261, 27, 503, 699, 438, 29, 115, 500, 945 are also "age gap" questions that have similar issues.

image
lupantech commented 6 months ago

Thank you for bringing this to our attention.

The examples you've pointed out, including question 699, originate from the source dataset, KVQA. This dataset occasionally includes questions with ambiguous information or unclear instructions, particularly regarding the calculation of age gaps.

The confusion primarily stems from the perspective of "left" and "right" in the image, as well as the lack of clear instructions on how to calculate the age gap. The ground truth answer of "0" for question 699 indeed suggests a perspective based on the direction the subjects are facing, rather than the viewer's perspective.

Considering the need for consistent evaluation of current models, we chose to keep the raw format of the examples as they appear in the source dataset for now. However, we greatly appreciate your valuable suggestions and recognize the importance of more precise and unambiguous wording in these questions.