lupantech / MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
https://mathvista.github.io/
Creative Commons Attribution Share Alike 4.0 International
197 stars 28 forks source link

Problem 791: Malformed problem #3

Closed mbchang closed 6 months ago

mbchang commented 7 months ago

The problem description of question 791 is "Given $V_s$ = 5V, $R_1$ = 1k\u03a9, $R_2$ = 2.2k\u03a9, $R_3$ = 2.2k\u03a9, $R_4$ = 1.5k\u03a9, and $R_L$ = 4.7k\u03a9. Determine the voltage and current across $R_L$. Answer in unit of V (3 sig.fig.)." The ground truth answer is "1.06".

This question does not match the answer because the task is to return both the voltage and the current, not just the voltage.

lupantech commented 6 months ago

Thank you for pointing this out. The issue stems from the original dataset, TheoremQA, which may contain descriptions that do not perfectly align with the required answers.

The phrase "Answer in unit of V" in the question is intended to direct the model to focus on calculating the voltage across $R_L$. Although the question asks for both voltage and current, this specific instruction leads to an answer focused on voltage only, as indicated by the ground truth answer "1.06".

In our current evaluation process, we have designed an answer extractor utilizing ChatGPT/GPT-4. This tool reviews both the generated response and the raw question to accurately extract the answer for evaluation. Such an approach significantly enhances the robustness of our evaluation, particularly for questions with ambiguous or inaccurate descriptions.

We appreciate your input. Please feel free to contribute further feedback or suggestions.