scicode-bench / SciCode

A benchmark that challenges language models to code solutions for scientific problems
Apache License 2.0
83 stars 8 forks source link

Solution generation skips certain steps? #12

Closed idavidrein closed 1 month ago

idavidrein commented 1 month ago

In lines 203-205 of gencode_json.py, three steps of different problems are skipped. Could a comment be added explaining why?

for problem in data:
    prob_id = problem['problem_id']
    steps = len(problem['sub_steps'])
    print(f'Generating {prob_id}...')
    for i in range(steps):
        if (prob_id == "13" and i == 5) or (prob_id == "62" and i == 0)\
                or (prob_id == "76" and i == 2):
            continue
        gcode.generate_response_with_steps(problem, i + 1, steps, model, prompt_template)
mtian8 commented 1 month ago

The skipped subproblem-code pairs are directly given as part of the question design (These subproblem answers are extracted here: https://github.com/scicode-bench/SciCode/tree/main/eval/data). The reason is to control uncertainty and reduce the degrees of freedom in the evaluation process. By doing so, we limit the model’s randomness in problem-solving. Without this context, the evaluation would allow for too many possible solutions, leading to inconsistent results.

ofirpress commented 1 month ago

Thanks for answering @mtian8.