Closed idavidrein closed 1 month ago
The skipped subproblem-code pairs are directly given as part of the question design (These subproblem answers are extracted here: https://github.com/scicode-bench/SciCode/tree/main/eval/data). The reason is to control uncertainty and reduce the degrees of freedom in the evaluation process. By doing so, we limit the model’s randomness in problem-solving. Without this context, the evaluation would allow for too many possible solutions, leading to inconsistent results.
Thanks for answering @mtian8.
In lines 203-205 of
gencode_json.py
, three steps of different problems are skipped. Could a comment be added explaining why?