microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
33.92k stars 4.89k forks source link

[Bug][AutogenBench]: import error in HumanEval test script #2097

Open LinxinS97 opened 7 months ago

LinxinS97 commented 7 months ago

Describe the bug

The current HumanEval test is to replace a placeholder in coding/mytest.py with the test script in the HumanEval dataset. However, this will cause an import error, which prevents some specific problem from being solved and always outputs "SOME TESTS FAILED - TRY AGAIN !#!#".

For example, the following script is from HumanEval_38:

### Code in the prompt
from my_test import run_tests

def encode_cyclic(s: str):
    """
    returns encoded string by cycling groups of three characters.
    """
    # split string to groups. Each of length 3.
    groups = [s[(3 * i):min((3 * i + 3), len(s))] for i in range((len(s) + 2) // 3)]
    # cycle elements in each group. Unless group has fewer elements than 3.
    groups = [(group[1:] + group[0]) if len(group) == 3 else group for group in groups]
    return "".join(groups)

def decode_cyclic(s: str):
    """
    takes as input string encoded with encode_cyclic function. Returns decoded string.
    """

run_test(decode_cyclic)

where a function encode_cyclic is pre-defined for problem-solving. However, the run_test function will also use the encode_cyclic function, which is not imported correctly and will lead to the test failure.

### Code in coding/my_test.py
METADATA = {}

def check(candidate):
    from random import randint, choice
    import string

    letters = string.ascii_lowercase
    for _ in range(100):
        str = ''.join(choice(letters) for i in range(randint(10, 20)))
        encoded_str = encode_cyclic(str)  # encode_cyclic cannot be found
        assert candidate(encoded_str) == str

def run_tests(candidate):
    try:
        check(candidate)
        print("ALL TESTS PASSED !#!#\nTERMINATE")
    except:
        print("SOME TESTS FAILED - TRY AGAIN !#!#")

Steps to reproduce

Run the No.38 problem in HumanEval.

Model Used

No response

Expected Behavior

No response

Screenshots and logs

No response

Additional Information

No response

afourney commented 7 months ago

Good catch. Thanks for reporting