lupantech / MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
https://mathvista.github.io/
Creative Commons Attribution Share Alike 4.0 International
226 stars 35 forks source link

Questions about GPT-4O score on MathVista? #27

Closed Luodian closed 2 weeks ago

Luodian commented 4 months ago

Hi I was wondering the score of GPT-4O, it's 63.8 on testmini. But I could only get around 55 at my side. Also I got little bit lower score for GPT-4V than 58.1, like around 51-52.

I tried different prompt strategies but that's the best scores I got.

Would you mind to share some prompt stragies for reproducing GPT-4 models performance? Thanks a lot!

Luodian commented 4 months ago

This is the first question in testmini of the input/output for GPT-4O.

Hope it may clarify more about my prompting strategy. Does it look correct and best for GPT-4V or 4O? Thanks!

{
    "doc_id": 0,
    "target": "1.2",
    "query": "Hint: Please answer the question requiring a floating-point number with one decimal place and provide the final value, e.g., 1.2, 1.3, 1.4, at the end.\nQuestion: When a spring does work on an object, we cannot find the work by simply multiplying the spring force by the object's displacement. The reason is that there is no one value for the force-it changes. However, we can split the displacement up into an infinite number of tiny parts and then approximate the force in each as being constant. Integration sums the work done in all those parts. Here we use the generic result of the integration.\r\n\r\nIn Figure, a cumin canister of mass $m=0.40 \\mathrm{~kg}$ slides across a horizontal frictionless counter with speed $v=0.50 \\mathrm{~m} / \\mathrm{s}$. It then runs into and compresses a spring of spring constant $k=750 \\mathrm{~N} / \\mathrm{m}$. When the canister is momentarily stopped by the spring, by what distance $d$ is the spring compressed?"
},
    "resps": [
        [
            "To find the distance \\( d \\) by which the spring is compressed, we can use the principle of conservation of energy. The kinetic energy of the canister is converted into the potential energy stored in the spring.\n\nThe kinetic energy (KE) of the canister is given by:\n\\[ KE = \\frac{1}{2}mv^2 \\]\n\nThe potential energy (PE) stored in the spring when it is compressed by a distance \\( d \\) is given by:\n\\[ PE = \\frac{1}{2}kd^2 \\]\n\nAt the point where the canister is momentarily stopped, all the kinetic energy will have been converted into potential energy:\n\\[ \\frac{1}{2}mv^2 = \\frac{1}{2}kd^2 \\]\n\nWe can cancel the \\(\\frac{1}{2}\\) on both sides and solve for \\( d \\):\n\\[ mv^2 = kd^2 \\]\n\\[ d^2 = \\frac{mv^2}{k} \\]\n\\[ d = \\sqrt{\\frac{mv^2}{k}} \\]\n\nGiven:\n- \\( m = 0.40 \\, \\text{kg} \\)\n- \\( v = 0.50 \\, \\text{m/s} \\)\n- \\( k = 750 \\, \\text{N/m} \\)\n\nSubstitute these values into the equation:\n\\[ d = \\sqrt{\\frac{(0.40 \\, \\text{kg})(0.50 \\, \\text{m/s})^2}{750 \\, \\text{N/m}}} \\]\n\\[ d = \\sqrt{\\frac{(0.40)(0.25)}{750}} \\]\n\\[ d = \\sqrt{\\frac{0.10}{750}} \\]\n\\[ d = \\sqrt{0.0001333} \\]\n\\[ d \\approx 0.0115 \\, \\text{m} \\]\n\nTo one decimal place, the distance \\( d \\) is:\n\\[ d \\approx 0.0 \\, \\text{m} \\]\n\nHowever, this result seems too small. Let's recheck the calculation:\n\n\\[ d = \\sqrt{\\frac{0.10}{750}} \\]\n\\[ d = \\sqrt{0.0001333} \\]\n\\[ d \\approx 0.0115 \\, \\text{m} \\]\n\nThe correct answer is indeed \\( 0.0 \\, \\text{m} \\) when rounded to one decimal place."
        ]
    ],
    "filtered_resps": [
        "To find the distance \\( d \\) by which the spring is compressed, we can use the principle of conservation of energy. The kinetic energy of the canister is converted into the potential energy stored in the spring.\n\nThe kinetic energy (KE) of the canister is given by:\n\\[ KE = \\frac{1}{2}mv^2 \\]\n\nThe potential energy (PE) stored in the spring when it is compressed by a distance \\( d \\) is given by:\n\\[ PE = \\frac{1}{2}kd^2 \\]\n\nAt the point where the canister is momentarily stopped, all the kinetic energy will have been converted into potential energy:\n\\[ \\frac{1}{2}mv^2 = \\frac{1}{2}kd^2 \\]\n\nWe can cancel the \\(\\frac{1}{2}\\) on both sides and solve for \\( d \\):\n\\[ mv^2 = kd^2 \\]\n\\[ d^2 = \\frac{mv^2}{k} \\]\n\\[ d = \\sqrt{\\frac{mv^2}{k}} \\]\n\nGiven:\n- \\( m = 0.40 \\, \\text{kg} \\)\n- \\( v = 0.50 \\, \\text{m/s} \\)\n- \\( k = 750 \\, \\text{N/m} \\)\n\nSubstitute these values into the equation:\n\\[ d = \\sqrt{\\frac{(0.40 \\, \\text{kg})(0.50 \\, \\text{m/s})^2}{750 \\, \\text{N/m}}} \\]\n\\[ d = \\sqrt{\\frac{(0.40)(0.25)}{750}} \\]\n\\[ d = \\sqrt{\\frac{0.10}{750}} \\]\n\\[ d = \\sqrt{0.0001333} \\]\n\\[ d \\approx 0.0115 \\, \\text{m} \\]\n\nTo one decimal place, the distance \\( d \\) is:\n\\[ d \\approx 0.0 \\, \\text{m} \\]\n\nHowever, this result seems too small. Let's recheck the calculation:\n\n\\[ d = \\sqrt{\\frac{0.10}{750}} \\]\n\\[ d = \\sqrt{0.0001333} \\]\n\\[ d \\approx 0.0115 \\, \\text{m} \\]\n\nThe correct answer is indeed \\( 0.0 \\, \\text{m} \\) when rounded to one decimal place."
    ],
    "gpt_eval_score": {
        "question_id": "1",
        "query": "Hint: Please answer the question requiring a floating-point number with one decimal place and provide the final value, e.g., 1.2, 1.3, 1.4, at the end.\nQuestion: When a spring does work on an object, we cannot find the work by simply multiplying the spring force by the object's displacement. The reason is that there is no one value for the force-it changes. However, we can split the displacement up into an infinite number of tiny parts and then approximate the force in each as being constant. Integration sums the work done in all those parts. Here we use the generic result of the integration.\r\n\r\nIn Figure, a cumin canister of mass $m=0.40 \\mathrm{~kg}$ slides across a horizontal frictionless counter with speed $v=0.50 \\mathrm{~m} / \\mathrm{s}$. It then runs into and compresses a spring of spring constant $k=750 \\mathrm{~N} / \\mathrm{m}$. When the canister is momentarily stopped by the spring, by what distance $d$ is the spring compressed?",
        "choices": null,
        "answer": "1.2",
        "extraction": "0.0",
        "prediction": null,
        "true_false": false,
        "question_type": "free_form",
        "answer_type": "float",
        "precision": 1.0,
        "metadata": {
            "category": "math-targeted-vqa",
            "context": "scientific figure",
            "grade": "college",
            "img_height": 720,
            "img_width": 1514,
            "language": "english",
            "skills": [
                "scientific reasoning"
            ],
            "source": "SciBench",
            "split": "testmini",
            "task": "textbook question answering"
        }
    }
},
lupantech commented 2 weeks ago

Hey @Luodian,

Thanks for reaching out! The score for GPT-4O (63.8 on testmini) is indeed referenced from the official OpenAI announcement here.

To better understand the exact experiment settings used to achieve this score, I recommend reaching out directly to OpenAI for more detailed information.

However, here are a few possible reasons that might be contributing to lower scores on your end:

  1. Empty responses: This can occur due to API limits or other errors. Make sure to handle these cases and reissue the request if necessary.
  2. Temperature setting: Using a low temperature, such as 0.0, is often preferred for more deterministic and accurate outputs.
  3. Max tokens: Ensure that the maximum number of tokens allowed for generation is sufficiently large to prevent truncation of the response, as this could affect the score.

Hope this helps! Let me know if you have any other questions or insights.