scicode-bench / SciCode

A benchmark that challenges language models to code solutions for scientific problems
Apache License 2.0
83 stars 8 forks source link

Potentially Too Strict Judgement on Calculated Result #8

Closed XuGW-Kevin closed 2 months ago

XuGW-Kevin commented 2 months ago

It seems that the code is using atol=1e-8, rtol=1e-5 to judge whether two results are close enough. I personally find it too strict sometimes. For example, in problem 35.1, the following code is judged as correct:


def ground_state_wavelength(L, mr):
    '''Given the width of a infinite square well, provide the corresponding wavelength of the ground state eigen-state energy.
    Input:
    L (float): Width of the infinite square well (nm).
    mr (float): relative effective electron mass.
    Output:
    lmbd (float): Wavelength of the ground state energy (nm).
    '''
    # Constants
    h = 6.626e-34  # Planck constant in J*s
    c = 3e8  # Speed of light in m/s
    m_e = 9.109e-31  # Free electron mass in kg

    # Convert L from nanometers to meters
    L_m = L * 1e-9

    # Calculate the effective mass
    m = mr * m_e

    # Calculate the ground state energy E_1
    E_1 = (h**2) / (8 * m * L_m**2)

    # Calculate the corresponding wavelength λ
    lmbd = h * c / E_1

    # Convert the wavelength from meters to nanometers
    lmbd_nm = lmbd * 1e9

    return lmbd_nm

However, the following code is considered to be wrong:


def ground_state_wavelength(L, mr):
    '''Given the width of a infinite square well, provide the corresponding wavelength of the ground state eigen-state energy.
    Input:
    L (float): Width of the infinite square well (nm).
    mr (float): relative effective electron mass.
    Output:
    lmbd (float): Wavelength of the ground state energy (nm).
    '''
    # Constants
    h = 6.62607015e-34  # Planck constant in J*s
    c = 2.99792458e8  # Speed of light in m/s
    m_e = 9.10938356e-31  # Free electron mass in kg

    # Convert L from nanometers to meters
    L_m = L * 1e-9

    # Calculate the effective mass
    m = mr * m_e

    # Calculate the ground state energy E_1
    E_1 = (h**2) / (8 * m * L_m**2)

    # Calculate the wavelength λ
    lmbd = h * c / E_1

    # Convert the wavelength from meters to nanometers
    lmbd_nm = lmbd * 1e9

    return lmbd_nm

These two codes appear to have the same functionality. It seems unfair to give them a different judgement. Maybe it would be better to change the threshold to 1e-5 or 1e-4.

mtian8 commented 2 months ago

Thank you for the feedback! We have taken this into consideration. That is the reason why we have defined all of the scientific constants in the problem. For example, as you can see in question 35.1, we stated "Given the free electron mass is 9.109e-31kg, speed of light is 3e8m/s and the Planck constant is 6.626e-34J*s.". When we are doing error analysis for LM generation on SciCode, we conclude this kind of error as "not following instructions".

XuGW-Kevin commented 2 months ago

Thank you for the clarification! Using predefined scientific constants indeed helps ensure consistency in the problem solutions.