The current HumanEval test is to replace a placeholder in coding/mytest.py with the test script in the HumanEval dataset. However, this will cause an import error, which prevents some specific problem from being solved and always outputs "SOME TESTS FAILED - TRY AGAIN !#!#".
For example, the following script is from HumanEval_38:
### Code in the prompt
from my_test import run_tests
def encode_cyclic(s: str):
"""
returns encoded string by cycling groups of three characters.
"""
# split string to groups. Each of length 3.
groups = [s[(3 * i):min((3 * i + 3), len(s))] for i in range((len(s) + 2) // 3)]
# cycle elements in each group. Unless group has fewer elements than 3.
groups = [(group[1:] + group[0]) if len(group) == 3 else group for group in groups]
return "".join(groups)
def decode_cyclic(s: str):
"""
takes as input string encoded with encode_cyclic function. Returns decoded string.
"""
run_test(decode_cyclic)
where a function encode_cyclic is pre-defined for problem-solving. However, the run_test function will also use the encode_cyclic function, which is not imported correctly and will lead to the test failure.
### Code in coding/my_test.py
METADATA = {}
def check(candidate):
from random import randint, choice
import string
letters = string.ascii_lowercase
for _ in range(100):
str = ''.join(choice(letters) for i in range(randint(10, 20)))
encoded_str = encode_cyclic(str) # encode_cyclic cannot be found
assert candidate(encoded_str) == str
def run_tests(candidate):
try:
check(candidate)
print("ALL TESTS PASSED !#!#\nTERMINATE")
except:
print("SOME TESTS FAILED - TRY AGAIN !#!#")
Describe the bug
The current HumanEval test is to replace a placeholder in
coding/mytest.py
with the test script in the HumanEval dataset. However, this will cause an import error, which prevents some specific problem from being solved and always outputs "SOME TESTS FAILED - TRY AGAIN !#!#".For example, the following script is from
HumanEval_38
:where a function
encode_cyclic
is pre-defined for problem-solving. However, the run_test function will also use theencode_cyclic
function, which is not imported correctly and will lead to the test failure.Steps to reproduce
Run the No.38 problem in HumanEval.
Model Used
No response
Expected Behavior
No response
Screenshots and logs
No response
Additional Information
No response