What does this implement/fix? Explain your changes.
In the current evaluation logic, whenever either pred_try or pred_minimal_try fails, it will be judged as no apply, which is inconsistent with the purpose of evaluation (as long as there is a successful apply, subsequent evaluation will continue).
I modified it to treat it as no apply only if neither of them is successfully applied.
@icoderzqliu thanks so much for this catch! Apologies it took a while to get around to this. Just merged, and will update any logs in SWE-bench/experiments that were generated with the older script.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
In the current evaluation logic, whenever either
pred_try
orpred_minimal_try
fails, it will be judged asno apply
, which is inconsistent with the purpose of evaluation (as long as there is a successful apply, subsequent evaluation will continue). I modified it to treat it asno apply
only if neither of them is successfully applied.Any other comments?
🧡 Thanks for contributing!