Open Zyq-scut opened 1 year ago
I also noticed a few samples with false positive results due to check5. e.g. a generated program outputs ['NO', 'YES', 'YES']
and the ground truth outputs as 'YES\nNO\nYES\n'
still lead to a passing result after check5 (before that it was correctly identified as as a failure result).
The issue seems to be the transformation to set
data type: https://github.com/hendrycks/apps/blob/23e98569b871e67752422717d1e28e9193248194/eval/testing_util.py#L448
check6 might have a similar problem but I didn't see any such false positive cases yet.
I'm not sure when I'll have time to actively help debug this. I can certainly review PRs or assist others though.
Hi, thanks for your work. I don't quite understand the role of check5 in the evaluating process, it seems to bring some wrong results. Here is an example of 4496 test problem. The question is: My program is: When I pass 22 into the program, the ideal return result is “Christmas Eve Eve Eve”, but this program returns “Christmas Eve”. Obviously, this is a wrong answer, but check5 in the “run_test” function judges the result as correct. Is it a bug? Looking forward to your reply.